仕事概要
■Engineering & Research Division / AI Data Center DC Block Architect
■About the role
We are seeking an experienced AI Data Center DC Block Architect to join our team. In this role, you will be responsible for designing, developing, and optimizing modular, pre-engineered DC blocks that are tailored to support our organization's growing AI and machine learning workloads.
■Job Scope
1. AI Workload Analysis and Requirements:
- Assess the organization's current and future AI and machine learning requirements, including compute, storage, and networking needs.
- Collaborate with data science and IT teams to understand the specific performance, scalability, and reliability requirements of the AI workloads.
- Identify any unique hardware or software considerations for the AI DC blocks, such as the need for specialized accelerators or optimized software stacks.
2. DC Block Architecture and Design:
- Design modular, pre-engineered DC blocks that can efficiently support a variety of AI and machine learning workloads.
- Ensure the DC block architecture is scalable, resilient, and aligned with industry best practices and the organization's overall data center strategy.
- Optimize the DC block layout, power, cooling, and infrastructure to maximize performance, energy efficiency, and density.
3. Hardware Selection and Integration:
- Evaluate and select the appropriate server hardware, including CPUs, GPUs, and specialized AI accelerators (e.g., NVIDIA Tensor Core GPUs, Google TPUs).
- Determine the optimal storage solutions, considering factors like capacity, performance, and data redundancy (e.g., high-performance SSDs, NVMe, network-attached storage).
- Integrate the networking infrastructure to support the required bandwidth, low-latency communication, and data transfer requirements of the AI workloads.
4. Software Stack Development and Optimization:
- Design and develop the software stack for the AI DC blocks, including the operating system, containerization platform, and orchestration tools.
- Integrate and configure leading AI/ML frameworks and libraries (e.g., TensorFlow, PyTorch, Keras) to enable efficient model development and deployment.
- Implement data management and processing pipelines, leveraging tools like Apache Spark, Hadoop, or custom data ingestion and preprocessing workflows.
- Optimize the software stack for performance, scalability, and resource utilization to ensure the AI DC blocks operate at peak efficiency.
5. Monitoring and Observability:
- Develop comprehensive monitoring and observability capabilities for the AI DC blocks, including metrics, logging, and tracing.
- Implement data-driven insights and analytics to identify performance bottlenecks, optimize resource allocation, and ensure overall system reliability.
- Automate deployment, scaling, and management processes to streamline the operation and maintenance of the AI DC blocks.
6. Security and Compliance:
- Incorporate robust security measures, such as access controls, network segmentation, and data encryption, into the AI DC block design.
- Ensure compliance with relevant data privacy and regulatory requirements (e.g., GDPR, HIPAA) by implementing appropriate data governance and access policies.
- Develop and test disaster recovery and business continuity plans to ensure the resilience of the AI DC blocks in the event of failures or disasters.
7. Continuous Optimization and Scalability:
- Continuously monitor the performance and resource utilization of the AI DC blocks to identify opportunities for optimization.
- Implement auto-scaling and dynamic resource allocation mechanisms to handle fluctuations in AI workload demands.
- Explore options for distributed or federated learning architectures to scale the AI capabilities across multiple edge devices or smaller data centers.
■Internal common IT tools
- Google Workspace (Gmail, G-cal, Gmeet等)
- Slack
- Notion
- Dialpad
- SmartHR
- Money Foward
- Bakuraku
etc.
■About Engineering and Research Division
Our Engineering and Research Division consists of mainly three teams that handle end-to-end development of Hardware and software systems for Energy storage & power transfer solutions and services. Currently, approximately 50 specialists are engaged in the mission of advancing energy storage technologies and solutions.
The Division is organized into the following teams:
- Series Development: Responsible for prototyping, testing & validation , requirements engineering , series handover of new products including product support and commissioning.
- Advanced Engineering: Responsible for development and experimentation into emerging technologies to sustain our current and future roadmap of energy solutions with focus on a areas viz. embedded development, PCB design, model based development, battery management, power conversion, digital twins, edge computing, cloud solutions ,AI/ML based dispatch optimization, generation and forecasts.
- Product Lifecycle Management: Manages product & project life cycles by tracking across quality gates through development, sourcing, value engineering leading up to manufacturing and after sales activities through cross functional coordination and data intensive product life cycle assessment
Working alongside talented engineers from around the world, you will have the opportunity to thrive in a diverse environment that values autonomy and empowers individuals to make effective contributions while gaining new skills and experiences on some of the latest emerging technologies directly applied into our solutions.
必須スキル
- Extensive experience in designing, developing, and optimizing modular, pre-engineered data center DC blocks, with a focus on AI and machine learning workloads.
- Deep understanding of the latest hardware and software technologies for AI and machine learning, including specialized accelerators and optimized software stacks.
- Proficiency in cloud infrastructure as code (IaC) tools, such as Terraform, CloudFormation, or Ansible, for managing the DC block infrastructure.
- Strong expertise in container orchestration platforms (e.g., Kubernetes) and CI/CD pipelines for the AI software stack.
- Familiarity with leading AI/ML frameworks, data processing tools, and observability solutions.
- Understanding of cloud security best practices, compliance frameworks, and data protection regulations.
- Excellent problem-solving skills and the
求める人物像
- Proactively take on new challenges and approach difficult tasks with a positive mindset
- Excel at coordinating and communicating effectively with both internal and external stakeholders
- Embrace change and adapt flexibly in dynamic, fast-paced environments
- Take initiative to identify and address issues independently
- Possess strong communication skills, both verbal and written
-Demonstrate a willingness to actively learn and grow in unfamiliar areas
- Available to work at our Lab on a daily basis
応募概要
給与 | Best in industry (decided based on skills and experience) |
---|---|
勤務地 | Tokyo Office (43rd floor, Midtown Tower, 9-7-1 Akasaka, Minato-ku, Tokyo 107-6243) and POWERD LAB ( https://power-x.jp/en/about/powerd-lab ) |
雇用形態 | Full-time employee |
勤務体系 | ●Working hours/month ・Scheduled working hours: 8 hours*Scheduled working days ・Flexible hour system (core time 11:00-15:00, 60-minute break) ●Holidays/Vacations/Holidays: ・Saturdays, Sundays, national holidays, year-end and New Year holidays, and other days designated by the company ・Vacations: Paid holidays 12 days in the first year (5 days granted at the time of joining the company, remaining 7 days granted after 6 months) ・Special leave for special occasions, etc. |
試用期間 | Three months |
福利厚生 | ⚫︎Full social insurance coverage (employment insurance, workers' compensation insurance, health insurance, welfare pension insurance) ⚫︎Employee stock ownership plan (with incentives) |
企業情報
企業名 | 株式会社パワーエックス |
---|---|
設立年月 | 2021年3月 |
本社所在地 | 本社工場:〒706-0001 岡山県玉野市田井6-9-1 / 東京本社:〒107-6243 東京都港区赤坂9-7-1 ミッドタウンタワー 43階 |
資本金 | 19,494 百万円(資本準備金を含む) |
従業員数 | 156名 |