Private AI empowering Enterprises | HCLTech
Digital Foundation

Private AI for Enterprises

Explore the rise of GenAI, its ethical dilemmas, and why enterprises are adopting private AI for better data security, control and compliance in hybrid cloud environments.
 
5 minutes read
Manish Chauhan

Author

Manish Chauhan
Group Manager, Product Management Group, Hybrid Cloud Business Unit
5 minutes read
Share
Private AI for Enterprises

Today, as you unfold your daily newspaper, it's impossible to escape the ubiquitous presence of Generative AI. Its stories ripple through the pages, from ground-breaking advancements to ethical dilemmas that challenge the very fabric of our society. We have heard about content (images) copyright concerns, deepfake, AI Paradox and unreliable results. But there's more to the story than just the headlines. It's about exploring human creativity and the tough questions that come with advancing technology

According to Gartner, is a class of AI that learns from existing datasets to generate new and realistic datasets, at a scale that reflects the characteristics of the training datasets but doesn't repeat data. Here "don't repeat" is the nerve of Generative AI.

"A Machine's ability to create rather than replicate is Generative AI."

If an AI model is learning from unrestricted public datasets and artifacts, this is commonly referred to as Public AI but if it is allowed to learn from private datasets and artifacts under a controlled environment, this is termed as Private AI.

Gartner further explains that the accumulated risk of accuracy, biases, intellectual property, copyrights, cybersecurity, fraud, sustainability, overfitting (high capacity), sharp minima, non-robustness, lack of transparency, commercial licensing and ethical issues of Gen AI result from public artifacts.

However, enterprises have dependency concerns of relying, on the results of public Gen AI. This is explained in the research paper "Breaking the paradox of explainable deep learning" by "Arlind Kandra" released on May 2023 where an LLM model trained on massive public data, identified an image of an animal Panda as "panda with 57.7%" confidence, and "Gibbon with 99.3%" confidence after adding random pixel.

This concludes that any Prominent Language Model (LLM or SLM) wasn't specifically crafted for factual accuracy; their strength lies in statistical accuracy. These models excel at predicting what a typical human would anticipate in the continuation of a sentence.

This is important for organizations to understand their data privacy, and regulatory implications, control of results has intra-dependability on Gen AI results in many cases.

Let's understand the terminology in detail.

What is Public AI?

Let's understand this with multiple scenarios.

  1. Open-source AI Model trained on uncontrolled public data:

    This refers to Public AI, where any kind of publicly available AI algorithm that trains on a wide set of uncontrolled public data is typically pulled from the public or customers to improve their services of AI. For example, ChatGPT-3 is trained continuously using input data from users and Wikipedia.

  2. Licensed AI Model trained on uncontrolled public data:

    Even a licensed model trained on publicly available data, including content from social network posts, can be used to generate true or fabricated information convincingly. This in many cases can lead to different outcomes.

  3. Open-source AI model trained on private and controlled data:

    This refers to partially Public AI. In this case, AI models are trained on private data, but the model is open-sourced and everyone contributes to train and mature the AI models. This trained AI model is practiced by industry-wide competitors. In this case, all the underlying strategies and patterns can be exploited by the competitor. For example, AI models in the aviation industry are used for airfare ticketing based on real-time price prediction. This is practised industry-wide on an open-source model resulting in minimal differentiation to competitors.

  4. Licensed AI model trained on private and controlled data:

    This refers to Private AI, where the data is relevant to a particular organization only since the general-purpose information may not be relevant. Here, a commercially licensed model is fine-tuned or re-trained on private datasets in a controlled environment so that the result is proprietary to the organization.

What is Private AI?

It is the practice of training algorithms on data, specific to one organization where you are not helping create the collective intelligence that could help one of your competitors.

Training with private data can avoid some of the pitfalls but may still require efforts to make Gen AI outputs trustworthy and accurate. Moreover, constraining the training sets to be more domain-specific can narrow their range of responses.

In adherence to ethical practices in Gen AI, organizations in creative work have particularly introduced private Gen AI practices.

It enables the training of advanced tools and algorithms on organizations' extensive collections of licensed visual content, including video content and digital images accumulated throughout their operational history.

While these AI tools produce new images, the outcomes strictly adhere to the licensing and reuse agreements governing their content libraries. This approach serves to mitigate potential copyright issues while simultaneously creating opportunities for creators to license and monetize their work within private training sets.

The typical benefits of Private AI for enterprises can be counted as – Trustworthy, Controllable, Explainable, Reliable, Secure, Ethical, Fair, licensed, and sustainable.

Approach to Private AI (Build Vs Buy)

There can be two approaches to implementing Private AI.

In the first approach, the organization hires experts, data scientists, engineers, and software developers to build, train, and inherent their own Gen AI model from the in-house dataset.

This is termed as Build Your Model (BYOM), which allows world-class performance but may involve large costs or capex including computing plus data science team costs. For example, 1.3 million GPU hours are required on 40GB A100 GPUs to train the Bloomberg GPT model.

In the second approach, the organization buys/uses the readily available ML and LLM models to train the Gen AI capability on their private data and ensures that the model is never trained on the widely available algorithm.

Here the organisation improves an existing model. This approach reduces the need to hire a large team to build and maintain the models and run private AI models while still retaining data privacy.

However, organizations having limited data to train AI models may experience a degradation of model performance compared to models trained on heavy datasets. Such organizations opt for synthetic data for Model Training.

There are multiple foundation or base models, pre-trained models, and model platforms available to build enterprise models that come with commercial use licenses.

Private AI technique

Organizations can use different technique combinations to improve the existing model. Below are a few popular and preferred ones:

  1. Fine-Tuning: It is an economical Machine Learning (ML) technique to improve the performance of pre-trained large language models (LLMs) using selected datasets. For instance, the LIMA (Less Is More for Alignment) model of LLM can fine-tune on 1000 samples itself with a compute cost of 100$ only. (substack.com)
  2. RLHF (Reinforcement learning from human feedback): It helps improve the model by human intervention in the loop assessment.
  3. RAG (Retrieval augmented generation): It allows businesses to pass crucial information to models during generation time to produce more accurate responses without training the model on crucial information, but formatting information in vector DB with datasets similarities in matrix and relations with Cosine angle.

Implementation of Private AI involves a careful selection of techniques based on the specific use case, balancing the need for privacy with the desired level of model accuracy and performance.

A few industry-best practices implement Private AI as below:

  1. Differential privacy: Introduce noise (differentially private synthetic data) or perturbations to the training data to prevent the identification of individual data points
  2. Homomorphic encryption: Enable computations on encrypted data without decrypting it, thereby preserving privacy during processing
  3. Federated learning: Train models across decentralized devices (download the model from the data center and train it on private data) without exchanging raw data, ensuring privacy at the local level.
  4. Secure Multi-Party Computation (SMPC): Enable multiple parties to jointly compute a function over their inputs while keeping those inputs private.
  5. Trusted Execution Environments (TEEs): Use hardware-based solutions to create secure enclaves where computations can be performed with guarantees of data privacy.

Private AI business benefits

  1. Trust building: Implementing Private AI helps build trust among users and stakeholders in regulated or sensitive data-led verticals/ organizations.
  2. Encourages data sharing: Private AI techniques enable collaboration and information sharing between organizations without exposing sensitive details.

Enterprises Opting for In-house AI Systems for Private AI

Beyond the scalability and flexibility, there are also several compelling reasons why organizations might choose to enhance within private data centers with GPU-based AI infra capability.

  1. Regulatory compliance: Industries having data regulatory compliances prefer to practice private AI in-house
  2. Customization and control: Organisations can design infrastructure tailored to their needs with an in-house AI system. This customization can at times result in improved performance, efficiency, and the ability to make rapid changes as needs evolve
  3. Cost over time: From a long-term perspective, private AI can be the comparable or preferred option in certain cases when TCO is being compared considering all factors huge egress costs, especially for large enterprises with vast amounts of data and frequent training
  4. Latency reduction: For AI applications that require real-time responses, such as financial trading systems, even a slight delay introduced by internet transmission can be detrimental. In-house systems can offer reduced latency
  5. Intellectual property protection: Organisations dealing with patents, research for clients, and copyrights prefer to practice their AI model in-house
  6. Data privacy and security: Organisations with sensitive data, be it financial, healthcare, or proprietary research, may be reluctant to host such data on third-party platforms

Conclusion

An increasing number of organizations are opting for hybrid environments with flexibility and inter-operability of workload for their AI Practice, depending on multiple different factors and business dependency on AI.

Data is fuel for AI and organizations are considering extending and scaling up existing infra to utilize existing security, backups, and redundancy and help to keep data nearer to the AI model in a hybrid cloud to reduce latency in many cases.

The organizations relying on AI for business decision-making and commercial uses do leverage their practice on hybrid cloud infra for private AI and intend to utilize both deployments as per their needs. The private AI trend is propelled by a yearning for heightened control, improved security, adherence to compliance standards, and in certain instances, a pursuit of cost efficiency.

Private AI embodies a shift towards empowering organizations with enhanced control over their data and AI operations, safeguarding the integrity of privacy and security measures and aims to complement rather than compete with public cloud in many cases.

For more information, please write to us at HCBU-PMG@hcltech.com.

References:

Share On