A global tire manufacturing leader headquartered in Tokyo partnered with HCLTech to transform its high-performance computing (HPC) environment using AWS. The initiative addressed critical inefficiencies by implementing automated infrastructure deployment, dynamic scaling and resource cost tracking, including manual processes, limited scalability, and high licensing costs. This modernization improved scalability, reduced operational costs and enhanced resource visibility, enabling faster innovation and optimized performance through a flexible, cloud native HPC landscape.
The Challenge
Our client faced increasing inefficiencies and complexities within its HPC landscape, mainly due to a lack of automation and scalability. Key challenges included:
- Absence of on-demand automation: The HPC platform lacked automation capabilities, resulting in time-consuming manual processes
- Inflexible cluster resources: The HPC environment could not dynamically scale to match fluctuating demands, creating bottlenecks
- High licensing costs: The client was burdened by IBM LSF software licensing expenses and could not shift to a more cost-effective solution
- Lack of cost transparency: There was no mechanism to track resource consumption by user or business unit (BU), impeding accurate cost allocation and internal chargebacks
- Limited scalability: The inability to scale HPC resources hindered the client's ability to meet growing data and compute requirements
The Objective
The goal was to create an efficient, scalable HPC environment that minimized operational costs and complexity while enabling resource tracking for internal cost management.
- Automate infrastructure deployment: Use Infrastructure as Code (IaC) to build HPC resources on-demand, ensuring scalability and efficiency
- Optimize total cost of ownership (TCO): Implement a flexible commercial model and eliminate licensing costs associated with IBM LSF
- Enable chargeback mechanisms: Develop resource accounting to track costs by the user and facilitate BU-level chargebacks
- Enhance elasticity: Implement a more flexible, responsive HPC cluster to support on-demand scaling with Amazon EC2 instances
The Solution
To meet these objectives, HCLTech and AWS delivered a comprehensive HPC solution prioritizing automation, cost efficiency and scalability. The solution included:
Assessment
- Conducted a thorough evaluation of the client's HPC environment to pinpoint specific bottlenecks and inefficiencies
- Identified resource requirements and constraints unique to the client's business and industry
Build
- Automated provisioning: Terraform pipelines were used to automate the deployment of HPC resources, including AWS HPC7a and HPC6i instances
- Enhanced visualization: Use NICE DCV to set up on-demand remote visualization, allowing users to manage pre and post-processing tasks remotely
- Streamlined job management: Configured Slurm for job scheduling and management, with REST API integration to automate job submissions based on user personas
- Cost tracking and chargebacks: Implemented resource accounting to track user resource usage precisely, facilitating accurate chargebacks to individual BUs
Operate
- Delivered end-to-end operations management, ensuring seamless, efficient management of day-to-day HPC tasks
- Set up continuous monitoring of the HPC environment to ensure optimized performance and cost efficiency
The Impact
With the new HPC landscape, the client gained a flexible, automated environment that significantly improved scalability, reduced costs and optimized resource utilization.
- Cost savings: The client realized substantial savings by eliminating IBM LSF and Bright Cluster Manager licenses and adopting AWS ParallelCluster, which provides similar functionality in a cloud native and cost-effective format
- Faster time to innovation: Job performance improved by 30-40%, enabling the client to complete simulations faster and reduce time-to-insight
- Enhanced resource visibility: The implemented cost tracking provided clear visibility into resource usage by the user, enabling accurate internal billing and cost control
- Scalability and Elasticity: The AWS-based HPC environment dynamically scales with demand, ensuring the client can meet peak workloads without paying for idle resources
AWS services used:
- Amazon EC2
- AWS ParallelCluster
- AWS Batch
- Amazon Elastic File System (EFS)