仕事概要
■Engineering & Research Division / AI Data Center Architect
■About the role
We are seeking an experienced AI Data Center Architect to join our team. In this role, you will be responsible for designing, implementing, and optimizing a medium-sized AI data center that leverages the latest hardware and software technologies to support our organization's growing AI and machine learning workloads.
■Job Scope
1. AI Data Center Architecture and Design:
- Assess the organization's current and future AI and machine learning requirements, including compute, storage, and networking needs.
- Design a scalable, resilient, and efficient AI data center architecture that can accommodate a variety of AI workloads, such as model training, inference, and data processing.
- Ensure the architecture aligns with industry best practices, regulatory compliance, and the organization's IT and business strategies.
2. Hardware and Infrastructure Selection:
- Evaluate and select the appropriate server hardware, including CPUs, GPUs, and specialized AI accelerators (e.g., NVIDIA Tensor Core GPUs, Google TPUs).
- Determine the optimal storage solutions, considering factors like capacity, performance, and data redundancy (e.g., high-performance SSDs, NVMe, network-attached storage).
- Ensure the networking infrastructure can support the required bandwidth, low-latency communication, and data transfer requirements.
- Incorporate power and cooling considerations, such as efficient cooling systems and redundant power supplies, to maintain optimal operating conditions.
3. Software Stack Integration and Optimization:
- Evaluate and integrate the appropriate operating system (e.g., Linux distributions), containerization platform (e.g., Docker, Kubernetes), and orchestration tools.
- Integrate and configure leading AI/ML frameworks and libraries (e.g., TensorFlow, PyTorch, Keras) to enable efficient model development and deployment.
- Implement data management and processing pipelines, leveraging tools like Apache Spark, Hadoop, or custom data ingestion and preprocessing workflows.
- Optimize the software stack for performance, scalability, and resource utilization to ensure the AI data center operates at peak efficiency.
4. Monitoring, Observability, and Automation:
- Implement comprehensive monitoring and observability tools to track the performance, resource utilization, and health of the AI data center.
- Develop data-driven insights and analytics to identify bottlenecks, optimize resource allocation, and ensure overall system reliability.
- Automate deployment, scaling, and management processes to streamline the operation and maintenance of the AI data center.
5. Security and Compliance:
- Implement robust security measures, such as access controls, network segmentation, and data encryption, to protect the AI data center from potential threats.
- Ensure compliance with relevant data privacy and regulatory requirements (e.g., GDPR, HIPAA) by implementing appropriate data governance and access policies.
- Develop and test disaster recovery and business continuity plans to ensure the resilience of the AI data center in the event of failures or disasters.
6. Continuous Optimization and Scalability:
- Continuously monitor the AI data center's performance and resource utilization to identify opportunities for optimization.
- Implement auto-scaling and dynamic resource allocation mechanisms to handle fluctuations in workload demands.
- Explore options for distributed or federated learning architectures to scale the AI capabilities across multiple edge devices or smaller data centers.
7. Collaboration and Knowledge Sharing:
- Provide technical leadership and mentor junior engineers on AI data center best practices and strategies.
- Collaborate with cross-functional teams (data science, IT operations, security) to ensure the AI data center meets the organization's evolving needs.
- Document and share knowledge, solutions, and lessons learned to promote continuous improvement within the organization.
■Internal common IT tools
- Google Workspace (Gmail, G-cal, Gmeet等)
- Slack
- Notion
- Dialpad
- SmartHR
- Money Foward
- Bakuraku
etc.
■About Engineering and Research Division
Our Engineering and Research Division consists of mainly three teams that handle end-to-end development of Hardware and software systems for Energy storage & power transfer solutions and services. Currently, approximately 50 specialists are engaged in the mission of advancing energy storage technologies and solutions.
The Division is organized into the following teams:
- Series Development: Responsible for prototyping, testing & validation , requirements engineering , series handover of new products including product support and commissioning.
- Advanced Engineering: Responsible for development and experimentation into emerging technologies to sustain our current and future roadmap of energy solutions with focus on a areas viz. embedded development, PCB design, model based development, battery management, power conversion, digital twins, edge computing, cloud solutions ,AI/ML based dispatch optimization, generation and forecasts.
- Product Lifecycle Management: Manages product & project life cycles by tracking across quality gates through development, sourcing, value engineering leading up to manufacturing and after sales activities through cross functional coordination and data intensive product life cycle assessment
Working alongside talented engineers from around the world, you will have the opportunity to thrive in a diverse environment that values autonomy and empowers individuals to make effective contributions while gaining new skills and experiences on some of the latest emerging technologies directly applied into our solutions.
必須スキル
- Extensive experience in designing, implementing, and managing medium-sized AI data centers and infrastructure.
- Deep understanding of the latest hardware and software technologies for AI and machine learning workloads.
- Proficiency in cloud infrastructure as code (IaC) tools, such as Terraform, CloudFormation, or Ansible.
- Strong expertise in container orchestration platforms (e.g., Kubernetes) and CI/CD pipelines.
- Familiarity with leading AI/ML frameworks, data processing tools, and observability solutions.
- Understanding of cloud security best practices, compliance frameworks, and data protection regulations.
- Excellent problem-solving skills and the ability to think strategically
求める人物像
- Proactively take on new challenges and approach difficult tasks with a positive mindset
- Excel at coordinating and communicating effectively with both internal and external stakeholders
- Embrace change and adapt flexibly in dynamic, fast-paced environments
- Take initiative to identify and address issues independently
- Possess strong communication skills, both verbal and written
-Demonstrate a willingness to actively learn and grow in unfamiliar areas
- Available to work at our Lab on a daily basis
応募概要
給与 | Best in industry (decided based on skills and experience) |
---|---|
勤務地 | Tokyo Office (43rd floor, Midtown Tower, 9-7-1 Akasaka, Minato-ku, Tokyo 107-6243) and POWERD LAB ( https://power-x.jp/en/about/powerd-lab ) |
雇用形態 | Full-time employee |
勤務体系 | ●Working hours/month ・Scheduled working hours: 8 hours*Scheduled working days ・Flexible hour system (core time 11:00-15:00, 60-minute break) ●Holidays/Vacations/Holidays: ・Saturdays, Sundays, national holidays, year-end and New Year holidays, and other days designated by the company ・Vacations: Paid holidays 12 days in the first year (5 days granted at the time of joining the company, remaining 7 days granted after 6 months) ・Special leave for special occasions, etc. |
試用期間 | Three months |
福利厚生 | ⚫︎Full social insurance coverage (employment insurance, workers' compensation insurance, health insurance, welfare pension insurance) ⚫︎Employee stock ownership plan (with incentives) |
企業情報
企業名 | 株式会社パワーエックス |
---|---|
設立年月 | 2021年3月 |
本社所在地 | 本社工場:〒706-0001 岡山県玉野市田井6-9-1 / 東京本社:〒107-6243 東京都港区赤坂9-7-1 ミッドタウンタワー 43階 |
資本金 | 19,494 百万円(資本準備金を含む) |
従業員数 | 156名 |