New Sentra tool to help classify sensitive enterprise data using LLMs

Classifying sensitive unstructured data like source codes or employee contracts will now be possible with the help of Sentra’s classification engine, which will now utilize large language models (LLM).

Cloud data security provider, Sentra, has announced that LLMs are now built directly into its data security platform and classification engine to help enterprise customers reduce the data attack surface.

“When properly leveraged, LLM has great potential to better classify unstructured data (such as paragraphs of text, and someday even images) than traditional pattern-matching techniques,” said Ken Buckler, research analyst at Enterprise Management Associates Inc. 

Sentra’s data classification engine has traditionally used regular expressions, list classifiers, and validation functions, according to Ron Reiter, co-founder, and chief technology officer of Sentra.

LLM adds context

The use of LLMs has added additional contexts to the process of classification, effecting an efficient tool for the classification of unstructured enterprise data, Sentra said.

“There are two additional contexts which the product now supports while classifying customer data — full (document level) classification and better entity recognition,” Reiter said. “Document-level classification enables Sentra to decide on the high-level type of document. For example, whether the document is a legal contract, a payslip, or a technical documentation.”

Adding LLMs has allowed entity recognition on unstructured text, such as identifying a person’s name by understanding the surrounding text, Reiter added.

Understanding the business context of unstructured customer data, the company said, will also enable enterprises to better align with compliance benchmarks, including GDPR, CCPA, and HIPPA.

Using LLMs still in gray

Despite the jump in enterprise adoption, LLMs continue to suffer from criticism. Sentra’s new LLM-powered scanning of data assets and analysis of metadata, like file names, schemas, and tags leave much open for discussion.

“When properly leveraged, LLM has great potential to better classify unstructured data [such as paragraphs of text, and someday even images] than traditional pattern-matching techniques,” Buckler said.

However, “as with any technology, we must ensure that the solutions implemented to secure data do not in turn result in additional attack vectors to access that data,” he said. Sensitive internal assets should never leave the enterprise boundaries during classification as this would expose the data to unnecessary risk of disclosure to third parties, he added.

Data and Information Security, Generative AI

Go to Source