Scaling Archival Solution with Databricks

Our client is a global pharmaceutical company specializing in advanced animal healthcare products, medicines, technologies and veterinary services. They used predictive forecasting data models stored in their Customer Relationship Management (CRM) tool and other cloud product platforms to support research and sales strategies. Due to high operational data volumes their CRM storage capacity reached 90% threshold, causing operational slowdowns. Existing archival tools and other cloud storages were inadequate for handling big data volume. HCLTech recommended Databricks for efficient storage and compute and Matillion for job orchestration, storing data in AWS S3 to manage their load effectively.

The Challenges

Addressing the CRM’s performance and storage limitations

Our client was facing frequent production issues and sought an immediate efficient solution to remediate their business users’ problem. Listed below are a few of our client’s challenges:

Performance-related issues: Longer than expected search times, longer record saving times, long wait times to access records and populate fields on the screen and other actions taking excessive time to complete.
Data optimization process: Manual data archival process was time consuming. Inability to update records resulted in frequent process break downs.
Cost of data storage: Almost one terabyte of data was generated monthly and since the existing CRM data storage was license based which necessitated extra expenditure for purchasing.
Unavailability of data during production issue: Critical jobs were failing due to breakdowns in the process resulting in unavailability of current analytical data for end users.
Loss of client data: Risk of losing important data while deleting old records.
Risk in managing data retention: Manual purging involved risk of deleting data within the data retention period.

HCLTech built a cloud cluster on Databricks using a compute feature powered by Apache Spark™ which helped to store data into AWS S3 and we used Matillion ETL for job orchestration.

The Objective

Implement auto-archival of CRM data using Databricks

We were tasked with building a simplified platform for data integration and cloud infrastructure management utilizing Databricks and Matillion. The resulting unified, scalable, cost-effective platform seamlessly integrated with AWS cloud storage, enhancing platform efficiency and enabling successful business analytics. The solution addressed day-to-day operational challenges, optimized license cost and efficiently managed storage – impressing the client with its comprehensive data processing, visualization and security management capabilities.

The Solution

Databricks cluster computing effectively addressed our client’s challenge

Databricks Lakehouse was chosen for its robust data management capabilities, offering flexibility, cost-efficiency and scalability.

Databricks cloud compute provides resources for executing various workloads like data engineering, data science and analytics. The Archival solution leveraged Databricks compute to run Matillion orchestration pipelines, effectively handling the streaming and storage of CRM data. Features like saving notebooks as python scripts and its unified platform (offering Python, SQL, ML-Runtimes, ML-Flow and Spark) facilitated seamless integration and garnered client appreciation. Users accessed archived data using Python and SQL, as needed.

The Impact

HCLTech scalable archival solution enhanced cost efficiency and productivity

HCLTech deployed an efficient solution to meet client needs, reducing overall costs by leveraging Databricks cloud on AWS cloud Marketplace.

Key benefits included:

Easy setup and administration of Databricks cloud compute cluster
Eliminated data archival production issues
15% increased productivity, efficiency and reliability was achieved in CRM by effective archival of historical data using Databricks
~15% savings on CRM storage licensing costs
10x - 40x enhanced scalability to handle large data processing including storage of attachments in encoded format, providing flexibility for future needs
Automated data retention policy as per compliance requirements
Successful archival of historical data, freeing up space in current production and improving daily business activities such as ETL loads and CRM access

Implementing a robust, scalable archival solution using Databricks