Streamlining data workflows to enhance efficiency by leveraging Databricks

Summary

Our client is an American pharmaceutical company that develops medicines, vaccines and animal health products. They needed data and consulting services to perform multiple research activities for achieving their business goals.

With their initial data engineering platform for building smart data pipelines across hybrid and multi-cloud architectures and to power modern analytics and hybrid integration, the client noticed some scope creeps and costs were increasing.

Hence, they sought assistance from HCLTech. We proposed switching to Databricks workflows to improve speed, efficiency, accuracy, flexibility and cost effectiveness. Powered by Delta Lake, Databricks combined the best of data warehouses and data lakes into a lake house architecture, giving our clients one unified platform for collaboration, data, analytics and managing AI workloads. HCLTech created a new solution that integrated batch and stream processing and included customized error handling and to send timely notifications to the Application Maintenance Support (AMS) team using Databricks.

The Challenge

Streamlining data operations by transitioning to Databricks for enhanced efficiency and integration

Our client faced challenges with their current platform, as it couldn’t address data engineering tasks adequately and was unable to schedule backend instances effectively, leading to unnecessary resource consumption
In their existing platform, the data sources were on multiple platforms and it was a time consuming and complex job to integrate with 145+ applications, upload codes and to execute various jobs
Their current platform was unable to schedule backend instances, which kept running, leading to increased resource utilization cost
Error handling framework was unavailable and timely notifications could not be sent to the AMS team
Lastly their existing platform lacked stability and would often need regular maintenance which impacted data operations

Opting for Databricks – a cloud-based data platform, powered by Apache Spark™, Delta Lake and ML Flow – would resolve their issues.

The Objective

Revolutionizing enterprise data intelligence with unified analytics and AI @ Scale

HCLTech was entrusted with the management and deployment of cloud infrastructure utilizing Databricks Data Intelligence Platform. The unified, open analytics platform, seamlessly integrated with the AWS cloud storage and security. HCLTech’s tailored workflows enhanced platform efficiency, enabling successful creation, deployment, sharing and maintenance of enterprise-grade data, analytics and AI solutions at scale. With capabilities spanning from data processing to visualization and security management, our client was impressed when HCLTech leveraged the Generative AI features to meet our client’s diverse needs. HCLTech’s solution further optimized efficiency by supporting batch and stream processing while migrating existing applications leading to improved resource utilization and cessation of the continuous backend instance operations.

The Solution

Unlocking data potential by leveraging Databricks Lakehouse for unified analytics

Our client opted for Databricks Lakehouse for its fusion of data lake and data warehouse capabilities, providing flexibility, cost-efficiency and scalability alongside the transactional reliability of data warehouses. Data management on Databricks relies on the principles of Atomicity, Consistency, Isolation and Durability (ACID) transactions commonly associated with data warehouses.

The Databricks Data Intelligence Platform framework empowered the execution of impeccable Business Intelligence (BI) and ML tasks across a diverse range of data types
Our solution utilized Databricks to construct pipeline handling of both streaming and batch data for applications, accommodating diverse data types economically
Databricks features of saving notebooks as Python scripts facilitated seamless integration, while its unified platform offering Python, SQL, ML-Runtimes, ML-Flow and Spark garnered client appreciation
Databricks facilitated cluster deployment, seamlessly integrated with 145+ applications, facilitating uploading of codes and jobs execution through a user friendly browser-based UI or REST API for our client so that all their data, analytics and AI jobs stayed on one unified data platform
These pipelines processed LAN, WiFi and sensor data from various buildings (from source systems), transforming it as per business needs before storing it in the AWS S3 Data Lake
The end users accessed dashboards through the appropriately organized Data Lake, facilitating easy data management across diverse sources with the readily available data pipelines

The Impact

Harnessing Databricks for scalable integration by optimizing data infrastructure

HCLTech efficiently scheduled instances on Databricks to meet client needs, reducing overall costs. Databricks streamlined data ingestion, automated ETL processing, ensured reliable workflow orchestration and provided comprehensive observability and monitoring. The other key benefits include:

Easy setup and administration of Databricks platform on AWS cloud Marketplace
Databricks Lakehouse provided API access to 145+ integrations and seamless connectivity with external systems like Redshift, Cassandra and Snowflake
There was a 20% reduction in cloud infrastructure costs with Databricks Serverless architecture which enabled faster integration
An efficiency gain of 20% for data engineering tasks was achieved after migrating to Databricks
The autoscaling feature added resources dynamically to support increased demand and removed them when not needed, reducing costs significantly