Facing litigation and having to produce company documents to third parties can be an unsettling experience. Some businesses react to this by attempting to do as much of the identification, preservation and collection work in-house, using either company staff or their trusted IT consultants. While this sounds like a good idea for keeping as much of the irrelevant company data from the outside and cutting costs, it often backfires when done without the required expertise and tools. Furthermore, it can derail the entire e-Discovery process since subsequent steps such as processing, review and production depend on the proper identification, preservation and collection of relevant ESI.
A frequently (and improperly) used tool for e-Discovery searches is the built-in search functionality in Ms Outlook, also known as Outlook Instant Search. Instead of engaging an expert to identify and collect electronic evidence, some businesses choose to have their IT staff get in front of each custodian’s computer, run a search in Outlook using Outlook Instant Search and copy the responsive e-mails to a new PST.
Outlook Instant Search relies on the Windows Search Service, and helps users perform quick searches in their mailboxes. This is a very convenient way to locate an e-mail you received from someone last week, or find that e-mail attachment with a specific name; but it is not, by any means, an e-Discovery search tool. Here are a few reasons why you should not use it to locate responsive e-mails for e-Discovery:
Why Outlook Instant Search is not Appropriate for e-Discovery
1. Incomplete e-Discovery Search Strategy
Relying on only the e-mails found on each custodian’s computer would be an incomplete strategy to begin with. In an Outlook/Exchange environment, additional key e-mails can be found on the server (e.g. e-mails that were not downloaded from the server, e-mails that were hard deleted within the deleted item retention period etc.) as well as on mobile devices, e-mail server back-ups, mobile device back-ups etc.
2. Lack of OCR
While most Ms Office documents are searchable, some file types such as non-searchable PDFs (e.g. an agreement scanned to PDF) or images (e.g. a fax message in TIFF format) do not contain extractable text. Since Windows Search does not perform optical character recognition (OCR) to make these documents searchable, many potential search hits could be missed. Computer forensics and e-Discovery tools have the ability to identify and OCR documents without extractable text before keyword searches are performed.
3. Limited File Type Support
Windows Search supports indexing the contents of most common file types such as e-mails, Ms Office files, PDFs etc. However, it does not support nearly as many file types as an e-Discovery search engine. For example, we have found that contents of many business documents such as WordPerfect files, E-transcript files, RAR and 7-Zip archives, CAD files etc. were not indexed out of the box without installing third party IFilters (interfaces that allow Windows Search Service to extract text and properties from documents). Furthermore, supported file types were not indexed when given an unsupported extension (e.g. a plain text file with the .xyz extension).
4. Lack of Accountability and Reporting
When the end user attempts to search a mailbox that has not yet been fully indexed, Outlook displays a warning that indicates that the results may be incomplete. However, no warning is shown for documents that the Windows Search Service could not index or search. In other words, the user interface gives you the impression that everything is working well, even if the data set contains file types that Windows Search cannot handle. On the other hand, performing searches for e-Discovery requires documenting the search process, logging and reporting search exceptions and handling them when possible.
5. Limited Search Syntax
Admittedly, the query syntax for Windows Search is quite detailed, and should be sufficient for its intended usage – casual, everyday search and filtering. However, it is missing some of the e-Discovery search staples such as proximity searches, granular control over wildcards, stemming, fuzzy searching, synonym searching etc. Consequently, most complex e-Discovery search queries cannot be directly translated to a query in Outlook.
For the purposes of this article, I searched a small PST containing 7,331 documents using Outlook Instant Search and an e-Discovery tool. Both searches consisted of a simple, three-keyword query and attachment families were included as part of the results.
The search performed directly in Outlook using Outlook Instant Search resulted in 4,575 responsive documents, while the search performed using the e-Discovery tool identified 5,585 documents as responsive. In other words, Outlook found 1,010 fewer responsive documents (approximately 18%) than the e-Discovery tool.
I believe that the dramatic difference in search performance shown in the example scenario was mainly due to the graphics and PDF files that needed to be OCRed, and file types, such as WordPerfect files, which Outlook Instant Search could not search. That being said, the possibility of missing almost 20% of the responsive documents (or even more, depending on the data set) when searching directly in Outlook using Outlook Instant Search is a scary thought.
Identification, preservation and collection stages of e-Discovery can be crucial for the outcome of your case, and should be handled expertly. If you are unable to work with an expert, you should at least obtain the right tools and training. Make sure that you are familiar with how the search engine works (e.g. search syntax, tokenization, foreign languages etc.), what it can and cannot search and how exceptions should be logged and handled. Test your tools against a known data set to make sure they are performing as expected, and review exception reports carefully to make sure you are searching everything that you should be.