Xerox Announces New Software to Protect Sensitive Data in Documents
IT News Online Staff 2007-10-15
Xerox Corp. announced that its researchers from Palo Alto Research Center Inc. (PARC) are developing a new software technology that will increase accuracy while reducing the time needed to remove sensitive or confidential material from documents.
Dubbed "Intelligent Redaction", the new software automates the process of removing confidential information from any document. Once users have identified the information they want to protect, the software automatically redacts all references to this information throughout the document. After information has been classified, that same information will be automatically redacted if it appears in other documents. This "intelligence" ensures a consistent level of security, saves time and increases redaction accuracy.
Xerox said the research aligns with its goal of developing smarter documents to make information-based work easier, more efficient and more effective.
Still in development, the technology combines PARC's security and privacy, natural language and user interface design expertise to develop semi-automated ways to identify and protect sensitive content. The intelligent redaction technology also creates a behind-the-scenes audit trail should the document or information be compromised.
"The tools available today can't provide sufficient content analysis and security because it's difficult to determine what is sensitive," said Jessica Staddon, manager of the security research area at PARC. "In a large organization the level of sensitivity changes depending on the person accessing the document. The sheer numbers of documents to be tracked and sorted further complicates the problem."
Redaction is the ability to control what someone sees. For example, redaction traditionally has been used in legal documents to limit access to information protected by client-attorney privilege. The result is a document that has been censored; certain information within the document is blocked out.
Traditional redaction has two big drawbacks. It requires a labor-intensive manual process to identify sections to censor, and management of different versions of the same document is cumbersome and difficult. PARC's intelligent redaction removes these obstacles. The user interface makes it easy for the document owner or author to identify the sensitive data. Then the security tools protect the sensitive information by allowing the document itself to hide or expose information or data within it, based on who has been granted access.
Current software encrypts whole documents. Intelligent redaction understands document context so it can perform partial encryption. Only sensitive sections or paragraphs are encrypted, while the rest of the document is not. The intelligent redaction software also displays or hides restricted portions of the document. Now the document appears different to different people.
The new software automates the process of removing confidential information in three steps. The software first analyzes the content of the document automatically and identifies entities of interest such as the names of persons or companies, topics, addresses and identification numbers and the relationships between them such as two people living at the same address.
The next step is for the author to review the document, highlight entities of interest and trace the relationship between entities. This simplifies the task of finding all sensitive information in a document and reduces the risk of missing anything sensitive. Finally, the software allows for selective encryption or redaction of sensitive sections of the document.
|