Introduction
Most organizations tend to collect large volumes of sensitive data that include Personally Identifiable Information (PII), Personal Health Information (PHI) and Payment Card Industry (PCI) data to provide relevant, customized services.
Identifying and protecting the sensitive data collected from unauthorized disclosure is critical. Every organization is responsible for effectively discovering, controlling and managing its sensitive data footprints and complying with all relevant data protection laws.
For example, the life sciences and healthcare industry deal with tremendous volumes of sensitive data containing clinical records, patient data and other PHI data, which falls under the purview of HIPAA. The financial services industry deals with PCI data and other sensitive data that must always be encrypted. Hence, it becomes essential to apply automation and the latest technologies to detect sensitive data at the point of ingestion/integration and take necessary actions to avoid data leaks.
Organizations often need help automatically detecting the growing list of sensitive data types and require more visibility into data security risks, especially when ingesting unstructured data. Customers rely on fully managed data security services that will automate protection against sensitive data leaks and leverage the capabilities of machine learning and pattern-matching techniques to address these limitations swiftly.
HCLTech’s DataPatrol Framework
To address these demands and to improve sensitive data discovery and governance in an AWS environment, HCLTech has provisioned a framework known as the 'DataPatrol Framework.' This is built using a rich set of AWS services like Amazon Macie, AWS Lambda, AWS Security Hub, Amazon EventBridge, Amazon SNS, Amazon QuickSight etc., among other services to accomplish critical tasks during the life cycle of sensitive documents being patrolled. This framework will use machine learning and pattern matching techniques to entirely automate the discovery of sensitive data, isolate the sensitive data based on the detected severity (High/Medium/Low) and provide a complete suite of pre-built analytics on the findings for further review and action.
Key features
DataPatrol Framework was originally built to address a few of our customer’s critical business challenges in identifying and protecting sensitive PII data on the cloud. This solution comprises the following key pillars, each with several interesting features crucial for building a robust and complete data patrolling solution.
- Sensitive data discovery - Fully managed, updated machine learning techniques for PII detection and the ability to define and use custom datatypes using regular expressions have proven to deliver quality discovery of a variety of sensitive datatypes from customers' source data.
- Secure data isolation and encryption - This feature will assist in effectively isolating highly sensitive data files right at the ingestion layer itself and prevent further leakage to the downstream systems.
- Severity-based email alerts - Based on the Amazon EventBridge events, this workflow automatically triggers Amazon SNS service to send custom email notifications to its subscribed users containing critical details on the sensitive data file location and its severity level warnings (High/Medium/Low).
- Audit and compliance reports - A consolidated DataPatrol report for each patrolling job will be auto-downloaded to a customer-specified location for quick review and action on the findings.
- Centralized management of sensitive data findings - Integration with AWS Security Hub provides a comprehensive vision and security findings management strategy to aggregate and analyze all highly sensitive data findings from a single window stored as a standard AWS Security Finding Format (ASFF) for further processing.
- Incident reporting and management – HCLTech’s DataPatrol framework is fully integrated with HCL's iONA(iAct) solution to auto-create an incident in the ServiceNow tool for every high-severity detection and assign it to appropriate the user group for further review and action.
- DataPatrol dashboard - It can deliver prebuilt ML-driven insights with auto-narratives embedded contextually in the dashboard using natural language for quick interpretation.
DataPatrol Framework – Architecture
HCLTech’s DataPatrol Framework Architecture is fully automated, from scanning the sensitive data at the point of ingestion to dashboarding key insights for business consumption.
This framework can seamlessly detect several identified and custom sensitive data types catering to any industry, ensuring PII protection and strict adherence to data privacy, compliance and regulatory needs such as GDPR, PCI-DSS and HIPAA.
It uses AWS native services, making it easily configurable, customizable and deployable in a customer cloud environment with easy upgrades and updates while reducing the overall TCO compared to other commercial software products. It can support unstructured data and be plugged into any layer requiring sensitive data discovery for the underlying raw source data.
Benefits
HCLTech’s DataPatrol architecture leverages native AWS services for sensitive data discovery and analytics. Key benefits include the following -
- Ability to swiftly identify and discover sensitive data, including Personal Identifiable Information (PII), Personal Health Information (PHI) and Payment Card Industry (PCI).
- Provision to move highly sensitive files from the source to a secure target location and prevent sensitive data leaks.
- Ability to notify users with custom email alerts upon sensitive data breaches.
- Equip businesses with a consolidated view of the DataPatrol report in an easily readable format (CSV) to review the sensitive data findings for Audit & Compliance requirements.
- Collate, monitor and process sensitive data findings from DataPatrol into a centralized hub, providing a comprehensive view of the security state and high-priority security issues.
- To empower business users with rich pre-configured ML-driven dashboards to review the key Insights on sensitive data for all processed DataPatrol jobs.