The client is a leading American biopharmaceutical company specializing in drug discovery, clinical trials and genomics research. Its mission is to improve healthcare and deliver innovative medicines and solutions. The company's work addresses intricate health challenges and significantly enhances the quality of life. It is dedicated to developing, manufacturing and commercializing drugs for chronic and complex diseases.
The Challenge
The client faced challenges with their supply chain, quality data, inventory, manufacturing execution system (MES) and CS data, purchase data, manufacturing data and research data processing infrastructure on Amazon cloud servers. They utilized Amazon instances for data workflows via Apache Hadoop and Apache Spark. However, escalating data volume and processing complexity led to scalability, performance and cost challenges.
- Scalability issues - The existing cloud server setup struggled to handle the increasing volume of genomic, clinical trial and research data, leading to performance bottlenecks
- High operational costs - Maintaining and scaling the cloud instances for processing large datasets resulted in significant operational costs
- Complex management - Managing and configuring the cloud server instances required substantial effort and technical expertise, diverting resources from core research activities
- Data processing delays - Processing large datasets took a lot of time, affecting the analysis of clinical trials, genomics, drug discovery, supply chain, quality, inventory, MES and CS data, purchase, manufacturing and research processes
The Objective
Enhancing the client's data processing infrastructure on Amazon cloud servers involved improving scalability to handle increasing data volumes, reducing operational costs associated with maintaining and scaling cloud instances, simplifying the management of cloud server instances and reducing data processing delays. The aim was to optimize the analysis of various data types, thereby improving efficiency in core research activities.
The Solution
The solution helped move the data processing infrastructure from Amazon EC2 to Amazon EMR for various data types. It aimed to solve the client's problems using Amazon EMR's scalability and cost-effectiveness. Thereby optimizing key business functions like drug discovery, clinical trials and genomics research.
Assessment and planning
- Conducted a thorough assessment of the current EC2-based infrastructure and data processing workflows specific to genomics research, clinical trials and drug discovery
- Identified the specific requirements and constraints of the client's data processing needs in the life sciences domain
Environment setup
- Configured Amazon EMR clusters tailored to the client's data processing requirements in genomics, clinical data analysis and bioinformatics
- Established secure connectivity between the client's data sources and the EMR environment using AWS Direct Connect and Amazon VPC, ensuring compliance with industry regulations such as HIPAA (Health Insurance Portability and Accountability Act) and GDPR (General Data Protection Regulation)
Data migration
- Transferred the existing datasets from Amazon EC2 to Amazon S3 to facilitate efficient data processing in Amazon EMR
- Ensured data integrity and minimal downtime during migration, which is crucial for ongoing clinical trials and research activities
Optimization and testing
- Optimized the EMR cluster configurations to maximize performance and cost-efficiency for processing large-scale genomic data and clinical trial analytics
- Conducted rigorous testing to validate the performance and reliability of the new setup, ensuring accuracy and compliance with research protocols
Deployment and monitoring
- Deployed the data processing workflows on Amazon EMR, enabling streamlined analysis of genomic data, clinical trial results and drug discovery datasets
- Implemented monitoring and alerting mechanisms using Amazon CloudWatch to ensure smooth operations and quick issue resolution, which is critical for maintaining research integrity and compliance
The Impact
The migration to Amazon EMR significantly improved the client's data processing infrastructure. Below is the impact of this transformation:
- Enhanced scalability - The client can now scale their processing capabilities to handle larger datasets and increased processing demands in genomics and clinical trials
- Cost savings - The migration to Amazon EMR significantly reduced operational costs by leveraging EMR's cost-effective pricing model
- Improved performance - Data processing times were drastically reduced, enabling quicker insights and decision-making in drug discovery and clinical trials
- Simplified management - The client experienced reduced complexity and operational overheads with Amazon EMR's managed nature, allowing researchers to focus on core scientific activities
- Future-proof infrastructure - With a robust and scalable infrastructure, the client is better equipped to handle future growth and evolving data processing requirements, ensuring ongoing innovation in the life sciences sector
- Reduced time spent - The time spent on cluster management decreased from ten hours/week to just two hours/week, representing an 80% reduction
- Accelerated process - Job execution time was cut from eight hours to three hours, an improvement of 62.5%, leading to faster data processing
- Reduced cycle time - The cost per job decreased from $500 to $200, a 60% reduction, which also contributed to reduced overall operational costs and improved efficiency
- Improved reliability - The error rate decreased from 5% to 1% and the time spent resolving errors dropped significantly, enhancing overall productivity and reliability
AWS Services Used
- Amazon EMR
- Amazon EC2
- Amazon EBS
- Amazon Machine Image
- AWS Identity and Access Management
- Amazon VPC
- Amazon S3