SOLUTIONS
In the age of globalization, documents created in different countries and different languages can be highly relevant to legal investigations. Litigation often spans national boundaries and legal teams are flooded with thousands of documents in languages other than English that need to be filtered, evaluated, and analyzed.
The growing importance of multi-language discovery creates new challenges for attorneys and their technology partners. E-Discovery is already complex, and that complexity grows by orders of magnitude when it involves documents in different languages, writing systems, and character sets. Yet the need for meticulous investigation is as crucial as ever for lawyers to provide the best possible outcomes for their clients.
Good news for lawyers and those who support them
Basis Technology helps the legal community meet its multilingual discovery challenge head-on. We provide comprehensive electronic discovery solutions that uncover evidence buried in terabytes of unstructured multilingual text — accurately, quickly and cost-effectively. We do it using the most advanced linguistics software in the industry — software found at the core of virtually all leading multilingual search engines and information retrieval applications.
Our multilingual e-Discovery solutions are based on the Rosette Linguistics Platform (RLP), proven in hundreds of commercial and government environments. Interoperable RLP software components are configured as building blocks for multilingual e-discovery, working seamlessly within discovery workflows and information retrieval applications while handling many different languages, character sets and data sources.
Make your discovery application multilingual
Our industry leading linguistics software is complemented by ease of integration within data mining, reviewing, search and other discovery applications used by legal teams. By plugging the RLP API into an application, users get instant access to unique e-Discovery tools covering major European, Asian and Middle Eastern languages. For legal professionals, it means the ability to examine multilingual text with unparalleled accuracy and efficiency.
Step 1: Identify the language(s) and encoding in a document, and convert to Unicode
Component:
Rosette Language Identifier (RLI)
RLI identifies the language(s) of a document and its encoding so content can be filtered and processed. Documents are converted to Unicode so that discovery and information retrieval applications can access a single data source regardless of language. Using a feature called the Rosette Language Boundary Locator (RLBL), multi-language documents are segmented into language regions that can be routed to separate processes. RLI identifies 55 languages with extreme accuracy, even on very short strings of text.
Step 2: Apply linguistic intelligence to identify word forms, parts of speech and sentence structure:
Component Rosette Base Linguistics (RBL)
RBL examines documents and performs a complete morphological analysis so text can be accurately filtered, analyzed and searched. RBL identifies parts of speech, sentence boundaries, word breaks, tokens and other linguistic components within a document, in European, Asian and Middle Eastern languages. The technology and linguistic data in RBL results from over 10 years of development and use in web and enterprise search engines.
Step 3: Extract the items of interest (including those you didn’t know about)
Component: Rosette Entity Extractor (REX)
REX sifts through unstructured text and identifies people, places, dates and other items that establish the true meaning of a document for further analysis. REX locates generic terms as well as custom entities such as specific names, phone numbers and email addresses. Statistical modeling helps determine if an entity resides within a document, rather than simply referring to a list of possibilities and risk overlooking a variation. The result is entity extraction technology that lets you find what you know – and also what you didn’t know.


