<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Meridian Discovery &#124; e-Discovery, Computer Forensics, Hosting</title>
	<atom:link href="http://www.meridiandiscovery.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.meridiandiscovery.com</link>
	<description>e-Discovery, Computer Forensics, Hosting</description>
	<lastBuildDate>Mon, 22 Apr 2013 22:11:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>Concordance CPL to Populate Production Attachment Ranges</title>
		<link>http://www.meridiandiscovery.com/software/concordance-cpl-populate-production-attachment-ranges/</link>
		<comments>http://www.meridiandiscovery.com/software/concordance-cpl-populate-production-attachment-ranges/#comments</comments>
		<pubDate>Thu, 21 Mar 2013 20:42:37 +0000</pubDate>
		<dc:creator>Arman Gungor</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[Concordance]]></category>
		<category><![CDATA[CPL]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/?p=3759</guid>
		<description><![CDATA[Have you ever had to calculate production attachment ranges (e.g. PRODBEGATT and PRODENDATT fields) manually? Perhaps the production software you used did not calculate these fields for you, or the production specifications changed and you had to add these fields after the fact. While the calculation is usually straightforward, things can get a bit more ...]]></description>
			<content:encoded><![CDATA[<p>Have you ever had to calculate production attachment ranges (e.g. PRODBEGATT and PRODENDATT fields) manually? Perhaps the production software you used did not calculate these fields for you, or the production specifications changed and you had to add these fields after the fact. While the calculation is usually straightforward, things can get a bit more tricky if some of the attachment families were not produced entirely (i.e. you need to shrink the review attachment ranges to account for the documents that were not produced).</p>
<p>We have created a Concordance CPL called &#8220;Populate_Prod_Att&#8221; to help make things a bit easier. The CPL reads the existing review attachment ranges and production Bates numbers in your Concordance database, and calculates the production attachment ranges for you.</p>
<h4>Concordance CPL Features</h4>
<ul class="list2 list_color_blue">
<li>Calculates production attachment ranges based on review attachment ranges and production Bates numbers</li>
<li>Can automatically shrink attachment ranges when some of the documents from an attachment family are not produced</li>
<li>Can clear production attachment range fields for single documents (i.e. documents that are not part of an attachment family)</li>
<li>Can be used for both TIFF productions and native-only productions</li>
<li>Can work on the entire database or active query depending on your preference</li>
</ul>
<h4>Input</h4>
<p>As the input, the CPL takes the following 4 fields: BEGATTACH, ENDATTACH, PRODBEG, PRODEND (field names can be different depending on your database structure).</p>
<p><b>BEGATTACH:</b> Should be populated with the beginning review attachment range Bates number.</p>
<p><b>ENDATTACH:</b> Should be populated with the ending review attachment range Bates number.</p>
<p><b>PRODBEG:</b>  Should be populated with the beginning production Bates number.</p>
<p><b>PRODEND:</b>  Should be populated with the ending production Bates number.</p>
<p>Please see the following table for an example of how the input fields should look in your database. In this example, the legal team decided to not produce the document starting with REV00000013. Consequently, the production attachment range will need to be reduced from 6 pages (REV00000011 &#8211; REV00000016) to 5 pages (PROD0000001 &#8211; PROD0000005).</p>
<p><font style="font-size:11px;"></p>
<div class="table_style">
<table>
<thead>
<tr>
<td>BEGDOC</td>
<td >ENDDOC</td>
<td >BEGATTACH</td>
<td >ENDATTACH</td>
<td>PRODBEG</td>
<td >PRODEND</td>
</tr>
</thead>
<tr>
<td >REV00000011</td>
<td>REV00000012</td>
<td>REV00000011</td>
<td>REV00000016</td>
<td>PROD0000001</td>
<td>PROD0000002</td>
</tr>
<tr>
<td>REV00000014</td>
<td>REV00000016</td>
<td>REV00000011</td>
<td>REV00000016</td>
<td>PROD0000003</td>
<td>PROD0000005</td>
</tr>
</table>
</div>
<p></font></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 1 &#8211; Sample Input Fields</p>
<h4>Output</h4>
<p>The program outputs the calculated values to two fields: PRODBEGATTACH and PRODENDATTACH. Since these fields are used to store the output, they will be overwritten. The output for the sample documents in Figure 1 would be as follows:</p>
<p><center><br />
<font style="font-size:11px;"></p>
<div style="width:250px" class="table_style">
<table>
<thead>
<tr>
<td>PRODBEGATTACH</td>
<td>PRODENDATTACH</td>
</tr>
</thead>
<tr>
<td>PROD0000001</td>
<td>PROD0000005</td>
</tr>
<tr>
<td>PROD0000001</td>
<td>PROD0000005</td>
</tr>
</table>
</div>
<p></font></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 2 &#8211; Sample Output</p>
<p></center></p>
<h4>Usage for Native-only Productions</h4>
<p>If you are working with a native-only database (i.e. you wish to populate the production parent ID field instead of production attachment ranges), you can do so by pointing at the same field (the review PARENTID field) for both the BEGATTACH and ENDATTACH fields. When you do, the program will not prompt you for the PRODEND and PRODENDATTACH fields, and will not attempt to populate the ending production attachment range field. It will only populate the PRODBEGATTACH field, which will be your production parent ID.</p>
<div class="error">
<div class="message_box_content">WARNING</div>
<div class="clearboth"></div>
</div>
<div class="error_msg">
<div class="message_box_content">
<ul>
<li>This Concordance CPL will make changes to your Concordance database. Specifically, it will overwrite the contents of the PRODBEGATTACH and PRODENDATTACH fields that you choose.</li>
<li>Make a back-up of your Concordance database before running this program.</li>
</ul>
</div>
<div class="clearboth"></div>
</div>
<p>This program is available for download free of charge. Feel free to give it a try and let us know your thoughts.</p>
<h4>For Concordance 8:</h4>
<p><i><span class="icon_text icon_download"><a href="http://www.meridiandiscovery.com/downloads/Populate_Prod_Att+CPL+for+Concordance+v8">Populate_Prod_Att CPL for Concordance v8 (Version 1.04)</a></span></i></p>
<h4>For Concordance 9:</h4>
<p><i><span class="icon_text icon_download"><a href="http://www.meridiandiscovery.com/downloads/Populate_Prod_Att+CPL+for+Concordance+v9">Populate_Prod_Att CPL for Concordance v9 (Version 1.04)</a></span></i></p>
<h4>For Concordance 10:</h4>
<p><i><span class="icon_text icon_download"><a href="http://www.meridiandiscovery.com/downloads/Populate_Prod_Att+CPL+for+Concordance+v10">Populate_Prod_Att CPL for Concordance v10 (Version 1.04)</a></span></i><br />
<br/></p>
<p>If you like this CPL, you may also like our <a href="http://www.meridiandiscovery.com/software/concordance-cpl-to-create-database-dcb-from-load-file/" title="Concordance CPL to Create Database (DCB) from Load File" target="_blank">CPL to Create Database (DCB) from Load File</a>.</p>
<h4>Free Software Updates</h4>
<p>Would you like to receive e-mail updates when a new version of this CPL becomes available? Leave us your e-mail address below and we will keep you updated.<br />
<!-- Begin MailChimp Signup Form --></p>
<div id="mc_embed_signup">
<form action="http://meridiandiscovery.us4.list-manage.com/subscribe/post?u=79cf752bfd5282ec734c24456&amp;id=a1298a3819" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank">
<input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required>
<div class="clear">
<input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button"></div>
</form>
</div>
<p><!--End mc_embed_signup--></p>
<p style="font-size:10px;"><i>Concordance is a registered trademark of LexisNexis. Other products or services may be trademarks or registered trademarks of their respective companies.</i></p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/software/concordance-cpl-populate-production-attachment-ranges/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why You Shouldn&#8217;t Use Outlook Instant Search for e-Discovery</title>
		<link>http://www.meridiandiscovery.com/articles/why-you-shouldnt-use-outlook-instant-search-for-e-discovery/</link>
		<comments>http://www.meridiandiscovery.com/articles/why-you-shouldnt-use-outlook-instant-search-for-e-discovery/#comments</comments>
		<pubDate>Wed, 13 Mar 2013 17:24:07 +0000</pubDate>
		<dc:creator>Arman Gungor</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Computer Forensics]]></category>
		<category><![CDATA[e-Discovery]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/?p=3574</guid>
		<description><![CDATA[Facing litigation and having to produce company documents to third parties can be an unsettling experience. Some businesses react to this by attempting to do as much of the identification, preservation and collection work in-house, using either company staff or their trusted IT consultants. While this sounds like a good idea for keeping as much ...]]></description>
			<content:encoded><![CDATA[<p>Facing litigation and having to produce company documents to third parties can be an unsettling experience. Some businesses react to this by attempting to do as much of the identification, preservation and collection work in-house, using either company staff or their trusted IT consultants. While this sounds like a good idea for keeping as much of the irrelevant company data from the outside and cutting costs, it often backfires when done without the required expertise and tools. Furthermore, it can derail the entire e-Discovery process since subsequent steps such as processing, review and production depend on the proper identification, preservation and collection of relevant ESI.</p>
<p>A frequently (and improperly) used tool for e-Discovery searches is the built-in search functionality in Ms Outlook. Specifically, instead of engaging an expert to identify and collect electronic evidence, some businesses choose to have their IT staff get in front of each custodian&#8217;s computer, run a search in Outlook and copy the responsive e-mails to a new PST.</p>
<p>Outlook Instant search relies on the Windows Search Service, and helps users perform quick searches in their mailboxes. This is a very convenient way to locate an e-mail you received from someone last week, or find that e-mail attachment with a specific name; but it is not, by any means, an e-Discovery search tool. Here are a few reasons why you should not use it to locate responsive e-mails for e-Discovery:</p>
<h4>1. Incomplete e-Discovery Search Strategy</h4>
<p>Relying on only the e-mails found on each custodian&#8217;s computer would be an incomplete strategy to begin with. In an Outlook/Exchange environment, additional key e-mails can be found on the server (e.g. e-mails that were not downloaded from the server, e-mails that were hard deleted within the deleted item retention period etc.) as well as on mobile devices, e-mail server back-ups, mobile device back-ups etc.</p>
<h4>2. Lack of OCR</h4>
<p>While most Ms Office documents are searchable, some file types such as non-searchable PDFs (e.g. an agreement scanned to PDF) or images (e.g. a fax message in TIFF format) do not contain extractable text. Since Windows Search does not perform optical character recognition (OCR) to make these documents searchable, many potential search hits could be missed. Computer forensics and e-Discovery tools have the ability to identify and OCR documents without extractable text before keyword searches are performed.</p>
<h4>3. Limited File Type Support</h4>
<p>Windows Search supports indexing the contents of most common file types such as e-mails, Ms Office files, PDFs etc. However, it does not support nearly as many file types as an e-Discovery search engine. For example, we have found that contents of many business documents such as WordPerfect files, E-transcript files, RAR and 7-Zip archives, CAD files etc. were not indexed out of the box without installing third party IFilters (interfaces that allow Windows Search Service to extract text and properties from documents). Furthermore, supported file types were not indexed when given an unsupported extension (e.g. a plain text file with the .xyz extension).</p>
<h4>4. Lack of Accountability and Reporting</h4>
<p>When the end user attempts to search a mailbox that has not yet been fully indexed, Outlook displays a warning that indicates that the results may be incomplete. However, no warning is shown for documents that the Windows Search Service could not index or search. In other words, the user interface gives you the impression that everything is working well, even if the data set contains file types that Windows Search cannot handle. On the other hand, performing searches for e-Discovery requires documenting the search process, logging and reporting search exceptions and handling them when possible.</p>
<h4>5. Limited Search Syntax</h4>
<p>Admittedly, the <a href="http://office.microsoft.com/en-us/outlook-help/learn-to-narrow-your-search-criteria-for-better-searches-in-outlook-HA010238831.aspx" title="Learn to narrow your search criteria for better searches in Outlook" rel="nofollow" target="_blank">query syntax</a> for Windows Search is quite detailed, and should be sufficient for its intended usage &#8211; casual, everyday search and filtering. However, it is missing some of the e-Discovery search staples such as proximity searches, granular control over wildcards, stemming, fuzzy searching, synonym searching etc. Consequently, most complex e-Discovery search queries cannot be directly translated to a query in Outlook.</p>
<h3>Example Scenario</h3>
<p>For the purposes of this article, I searched a small PST containing 7,331 documents using Outlook and an e-Discovery tool. Both searches consisted of a simple, three-keyword query and attachment families were included as part of the results. </p>
<p>The search performed directly in Outlook resulted in 4,575 responsive documents, while the search performed using the e-Discovery tool identified 5,585 documents as responsive. In other words, Outlook found 1,010 fewer responsive documents (approximately 18%) than the e-Discovery tool (see Figure 1).</p>
<p><a href="http://www.meridiandiscovery.com/base/wp-content/uploads/2013/03/Search_Results.png"><img width="500" height="150" alt="Example Search Results" src="http://www.meridiandiscovery.com/base/wp-content/themes/meridian/cache/images/Search_Results-500x150.png" /></a> </p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 1 &#8211; Example Search Results</p>
<h3>Conclusions</h3>
<p>I believe that the dramatic difference in search performance shown in the example scenario was mainly due to the graphics and PDF files that needed to be OCRed, and file types, such as WordPerfect files, which Outlook could not search. That being said, the possibility of missing almost 20% of the responsive documents (or even more, depending on the data set) when searching directly in Outlook is a scary thought. </p>
<p>Identification, preservation and collection stages of e-Discovery can be crucial for the outcome of your case, and should be handled expertly. If you are unable to work with an expert, you should at least obtain the right tools and training. Make sure that you are familiar with how the search engine works (e.g. search syntax, tokenization, foreign languages etc.), what it can and cannot search and <a href="http://www.meridiandiscovery.com/articles/exception-handling-and-reporting-in-e-discovery/" title="Exception Handling and Reporting in e-Discovery" target="_blank">how exceptions should be logged and handled</a>. Test your tools against a known data set to make sure they are performing as expected, and review exception reports carefully to make sure you are searching everything that you should be.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/articles/why-you-shouldnt-use-outlook-instant-search-for-e-discovery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>8 Tips for Preparing a Proper TIFF Production</title>
		<link>http://www.meridiandiscovery.com/how-to/8-tips-for-preparing-a-proper-tiff-production/</link>
		<comments>http://www.meridiandiscovery.com/how-to/8-tips-for-preparing-a-proper-tiff-production/#comments</comments>
		<pubDate>Mon, 04 Mar 2013 21:50:39 +0000</pubDate>
		<dc:creator>Arman Gungor</dc:creator>
				<category><![CDATA[How-to]]></category>
		<category><![CDATA[e-Discovery]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/?p=3508</guid>
		<description><![CDATA[Legal teams often choose to prepare image productions accompanied by load files, and many of them make simple mistakes or bad choices that make it unnecessarily difficult for the recipient to utilize the produced information. While helping a firm sort out a disastrous incoming production, I was inspired to write this post with the hope ...]]></description>
			<content:encoded><![CDATA[<p>Legal teams often choose to prepare image productions accompanied by load files, and many of them make simple mistakes or bad choices that make it unnecessarily difficult for the recipient to utilize the produced information. While helping a firm sort out a disastrous incoming production, I was inspired to write this post with the hope that it may help someone avoid an unnecessary dispute. Assuming that the e-Discovery processing leading to the production was performed competently, here are a few quick tips for preparing a proper image production:</p>
<h4>1. Understand your discovery agreement</h4>
<p>Surprisingly, many law firms do not honor the agreed upon production format, or change the production specifications unilaterally as they see fit. For example, if you agreed to produce processing exceptions as placeholders with the accompanied native file, do not remove them from the production. Or, if you agreed to produce in Concordance v10 format, do not send out Concordance v8 load files.</p>
<h4>2. Normalize the images before endorsement</h4>
<p><a href="http://www.meridiandiscovery.com/articles/image-normalization-in-e-discovery-processing/" title="Image Normalization in e-Discovery Processing" target="_blank">Normalize</a> produced images before endorsing them so that the output has consistent dimensions, resolution and compression. Furthermore, normalized images allow the endorsements to be applied consistently in terms of size and location. You should not send out a mixture of LZW compressed TIFFs, JPGs; 200, 300 and 600 DPI images; portrait and landscape pages etc. </p>
<p>If the data set contains oversize images (e.g. technical drawings) whose legibility could be affected during normalization, those images should be handled with care. Depending on the scenario, they could be normalized to different specifications or left as is.</p>
<h4>3. Use legible endorsements</h4>
<p>You should ensure that the endorsement process pads the images so that endorsements do not overlap with the contents of the documents and cover the actual image, or disappear on dark backgrounds. Furthermore, endorsements should be performed using an easy to read font face and size (Arial 12 Bold is usually a good choice). Some smart endorsement tools can alert you if some of your designations are too long and will run off the page or overlap with other endorsements.</p>
<h4>4. Do not leave Bates gaps</h4>
<p>Take the time to ensure that the documents are Bates numbered and endorsed sequentially, without Bates gaps or overlapping Bates ranges. If gaps are unavoidable (e.g. you had to make last minute changes to a very large production and the deadline does not allow renumbering the entire data set), provide both image and text placeholders for each excluded page as opposed to providing a single placeholder for a Bates range.</p>
<h4>5. Organize and label your files clearly</h4>
<p>Each component of the production should be separated into clearly labeled folders such as &#8220;DATA&#8221;, &#8220;IMAGES&#8221;, &#8220;TEXT&#8221; etc. Furthermore, load files should be named so that they contain the volume ID as well as a description of what type of file each one is. For example, &#8220;ABC001 Concordance Loadfile.dat&#8221;.</p>
<h4>6. Review and load test the production</h4>
<p>Even though the documents may have been meticulously reviewed in the review database, produced documents should be loaded into a new production database and examined. This would serve as an additional quality control step to ensure that the load files work as intended, redactions and designations were applied correctly etc.</p>
<h4>7. Use sanitized delivery media</h4>
<p>If the production will be sent out on a hard drive, make sure that it is securely wiped before the production deliverable is copied onto it. Simply deleting the old data and copying a new deliverable usually means you will be sending your opponent unintended files which can often be recovered effortlessly.</p>
<p>For example, let&#8217;s assume that you had sent your e-Discovery service provider a hard drive with 20 PSTs for processing. They processed the PSTs, hosted the processed data, prepared the production deliverable after your review and delivered it to you on the same drive along with the source PSTs. If you simply delete the PSTs and send out the drive, you would essentially be sending your opponent the original native PSTs along with your production.</p>
<h4>8. Encrypt the data</h4>
<p>Whether you are sending the production deliverable via electronic file transfer, or on a physical medium, using strong encryption can go a long way towards making sure your sensitive data doesn&#8217;t fall into the wrong hands. Imagine that the courier company lost the package with the hard drive containing your production. Wouldn&#8217;t you be relieved if you knew that the drive was AES-256 encrypted and was virtually impossible to decrypt using today&#8217;s technology?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/how-to/8-tips-for-preparing-a-proper-tiff-production/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How OCRed PDF Productions Degrade Electronic Evidence</title>
		<link>http://www.meridiandiscovery.com/articles/how-ocred-pdf-productions-degrade-electronic-evidence/</link>
		<comments>http://www.meridiandiscovery.com/articles/how-ocred-pdf-productions-degrade-electronic-evidence/#comments</comments>
		<pubDate>Wed, 06 Feb 2013 22:36:17 +0000</pubDate>
		<dc:creator>Arman Gungor</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[e-Discovery]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/?p=3060</guid>
		<description><![CDATA[Many legal teams use endorsed searchable PDFs as their preferred format for producing electronic evidence. I suspect that two of the most common reasons for this may be that PDFs are a format attorneys are very familiar with, and that the productions can be prepared in-house using the tools the firm has. I am generally ...]]></description>
			<content:encoded><![CDATA[<p>Many legal teams use endorsed searchable PDFs as their preferred format for producing electronic evidence. I suspect that two of the most common reasons for this may be that PDFs are a format attorneys are very familiar with, and that the productions can be prepared in-house using the tools the firm has.</p>
<p>I am generally not a fan of PDF productions because I think they lack both the advantages of a native production (e.g. maintaining the metadata and functionality of complex electronic files) and the advantages of a TIFF production accompanied by load files (e.g. flexibility and ease of use with legal review platforms). In fact, our experience shows that upon receiving a searchable PDF production, most law firms hire an outside company, or engage their in-house litigation support team to have the documents converted to a TIFF production with load files so that they can be loaded into a legal review platform.</p>
<div class="one_half">
<p>More concerning to me, though, is the fact that searchable PDF productions are frequently prepared (unnecessarily) using OCR rather than extracted text.</p>
<p>Let&#8217;s take a look at a problematic, but commonly used production workflow:</p>
<ol>
<li>Electronic files are prepared for review using e-Discovery processing (TIFFs, load files, extracted text)</li>
<li>Documents are loaded into a review database (the database contains, among other things, metadata and extracted text)</li>
<li>Review is performed and a list of documents to be produced and their designations is determined</li>
<li>Images to be produced are OCRed, endorsed and converted to searchable PDFs for production</li>
</ol>
</div>
<div class="one_half last">
<div class="note">
<h4 class="note_title">&nbsp;</h4>
<div class="note_content">
<h4>Extracted Text</h4>
<p>Text that is captured from already searchable electronic documents (e.g. e-mails, Excel spreadsheets etc.) with great accuracy during e-Discovery processing. </p>
<h4>OCR Text</h4>
<p>Text that is obtained by interpreting and translating to text characters found on an image file using a process called Optical Character Recognition (OCR). OCR is typically used in the absence of extractable text and is much less accurate.
</p></div>
</div>
</div>
<div class="clearboth"></div>
<p>The workflow above effectively discards the more accurate, electronically extracted text and replaces it with OCR text with much less accuracy. For example, let&#8217;s assume that the original electronic document was the following Excel spreadsheet:</p>
<p><a href="http://www.meridiandiscovery.com/base/wp-content/uploads/2013/02/ExcelScreenshot.png"><img width="500" height="50" alt="Original Electronic File" src="http://www.meridiandiscovery.com/base/wp-content/themes/meridian/cache/images/ExcelScreenshot-500x50.png" /></a> </p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 1 &#8211; Original Native File</p>
<p>The text electronically extracted from this file is as follows. As expected, formatting is not maintained but the text is 100% accurate.</p>
<p><font style="font-size:12px;"></p>
<code class="code">Test Data Cell A1	(Sin[t]Sqrt[Abs[Cos[t]]])/(Sin[t]+7/5)-2Sin[t]+2, {t, 0, 10}
Test Data Cell A2	Test Data Cell B2</code>
<p></font></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 2 &#8211; Text Electronically Extracted from Original Native File</p>
<p>Once the native file is converted to TIFF, OCRed and exported as a searchable PDF, the text embedded in the PDF is degraded. Take a look at the following example prepared using one of the most accurate OCR engines:</p>
<p><font style="font-size:12px;"></p>
<code class="code">Test Data Cell A1	(Sin[t]Sqrt[Abs[Cos[t]]])/(Sin[t]+7/5)-2Sin[t]+2, {t, 0, l0}
                 	“7e&amp;t “Data &amp;ell 'SZ</code>
<p></font></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 3 &#8211; Text Extracted from Searchable PDF Created Using OCR</p>
<p><br/>
<p>As you can see in the screenshot above, even though the plain text in cell A1 and the formula in cell B1 were correctly recognized, the lack of contrast in cell A2 and the complex font in cell B2 prevented the OCR engine from correctly recognizing those characters. Needless to say, if you are on the receiving end of such a production and utilizing keyword searches, the lack of accuracy in the provided text can be quite frustrating. Please note that the issue described above is not caused by a deficiency of the PDF format, but by the unnecessary OCR process in the workflow.</p>
<h3>Conclusion</h3>
<p>During e-Discovery processing, text is typically extracted along with the coordinates of each character or word. This makes it possible to export searchable PDFs with embedded extracted text if desired. Instead of OCRing images, searchable PDFs should be created through the e-Discovery platform using the original, more accurate extracted text. OCR should be used on non-searchable documents such as image-only PDFs, scanned images etc. to complement the extracted text.</p>
<p>If some of the documents contain redactions, the redactions can be made in a platform that supports automatically mirroring the redactions in the extracted text. If this is not possible, OCR can be performed only on the redacted images.</p>
<p>While drafting discovery agreements, legal teams should consider the distinction between extracted text and OCR and request extracted text when available.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/articles/how-ocred-pdf-productions-degrade-electronic-evidence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>E-mail Conversation Index Analysis for Computer Forensics</title>
		<link>http://www.meridiandiscovery.com/how-to/e-mail-conversation-index-metadata-computer-forensics/</link>
		<comments>http://www.meridiandiscovery.com/how-to/e-mail-conversation-index-metadata-computer-forensics/#comments</comments>
		<pubDate>Tue, 08 Jan 2013 22:15:59 +0000</pubDate>
		<dc:creator>Arman Gungor</dc:creator>
				<category><![CDATA[How-to]]></category>
		<category><![CDATA[Computer Forensics]]></category>
		<category><![CDATA[e-Discovery]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/?p=2812</guid>
		<description><![CDATA[E-mail messages contain numerous metadata fields that are utilized by computer forensic examiners as well as legal teams. One key MAPI property that is frequently extracted by computer forensics and e-Discovery software, but yet usually overlooked or underutilized, is PR_CONVERSATION_INDEX. This property indicates the relative position of a message within a conversation thread and is ...]]></description>
			<content:encoded><![CDATA[<p>E-mail messages contain numerous metadata fields that are utilized by computer forensic examiners as well as legal teams. One key MAPI property that is frequently extracted by computer forensics and e-Discovery software, but yet usually overlooked or underutilized, is PR_CONVERSATION_INDEX.</p>
<p>This property indicates the relative position of a message within a conversation thread and is typically populated by the e-mail client for each outgoing message. Information extracted from the PR_CONVERSATION_INDEX property can help answer key questions such as:</p>
<ul class="list1 list_color_blue">
<li>Is the message in question a new message, or was it created by replying to or forwarding another message?</li>
<li>If the message is part of an e-mail thread, when was the thread started?</li>
<li>When were other messages in the e-mail thread created?</li>
</ul>
<h3>MSDN Documentation</h3>
<p>Microsoft has an article titled <a href="http://msdn.microsoft.com/en-us/library/office/cc765583.aspx" title="Tracking Conversations" target="_blank" rel="nofollow">Tracking Conversations</a> on how the PR_CONVERSATION_INDEX value is calculated. According to the article (quoted directly from their <a href="http://msdn.microsoft.com/en-us/library/office/cc765583.aspx" title="Tracking Conversations" target="_blank" rel="nofollow">website</a>):</p>
<div class="info">
<div class="message_box_content">
<ul class="list1 list_color_blue">
<li>ScCreateConversationIndex implements the index as a header block that is 22 bytes in length, followed by zero or more child blocks each 5 bytes in length.</li>
<li>The header block is composed of 22 bytes, divided into three parts:
<ul class="list1 list_color_blue">
<li>One reserved byte. Its value is 1.</li>
<li>Five bytes for the current system time converted to the FILETIME structure format.</li>
<li>Sixteen bytes holding a GUID, or globally unique identifier.</li>
</ul>
</li>
<li>Each child block is composed of 5 bytes, divided as follows:
<ul class="list1 list_color_blue">
<li>One bit containing a code representing the difference between the current time and the time stored in the header block. This bit will be 0 if the difference is less than .02 second and greater than two years and 1 if the difference is less than one second and greater than 56 years.</li>
<li>Thirty one bits containing the difference between the current time and the time in the header block expressed in FILETIME units.This part of the child block is produced using one of two strategies, depending on the value of the first bit. If this bit is zero, ScCreateConversationIndex discards the high 15 bits and the low 18 bits. If this bit is one, the function discards the high 10 bits and the low 23 bits.</li>
<li>Four bits containing a random number generated by calling the Win32 function GetTickCount.</li>
<li>Four bits containing a sequence count that is taken from part of the random number.</li>
</ul>
</li>
</ul>
</div>
<div class="clearboth"></div>
</div>
<p>However, our experience differs from Microsoft&#8217;s documentation in three areas:</p>
<ol>
<li>The first byte is not merely a fixed-value reserved byte, but is also a part of the FILETIME structure that follows it.</li>
<li>The 6-byte FILETIME value (including the reserved byte) is actually the 6 most significant bytes of the 8-byte FILTIME structure, and needs to be padded with 2 bytes of zeroes.</li>
<li>The time difference values found in the child blocks do not indicate the difference in time from the header date, but from the date of the previous child block.</li>
</ol>
<p>Further research on the discrepancies above revealed that Joachim Metz of Hoffmann Investigations noted the same issues as #1 and #3 above in his article titled &#8220;E-mail and appointment falsification analysis&#8221;.</p>
<h3>Example Calculation</h3>
<p>Let&#8217;s take a look at an example and try to make sense of the PR_CONVERSATION_INDEX value.</p>
<p><b>PR_CONVERSATION_INDEX</b><br/>01CDE90ABFE0D78F0E4280824120B2F1D0E3C07ED0070000CCBA300000114460  (32-bytes)</p>
<p>Based on the Microsoft documentation, the PR_CONVERSATION_INDEX is divided as follows:</p>
<table style="border: solid 1px #666666;">
<tr style="border-bottom: solid 1px #666666;font-weight:bold;">
<td style="background-color:#D1B5FF; padding:5px;">FILETIME Value</td>
<td style="background-color:orange; padding:5px;">GUID</td>
<td style="background-color:#89A7FF; padding:5px;">Child Block 1</td>
<td style="background-color:#B7FF93; padding:5px;">Child Block 2</td>
</tr>
<tr>
<td style="background-color:#D1B5FF; padding:5px;">01CDE90ABFE0</td>
<td style="background-color:orange; padding:5px;">D78F0E4280824120B2F1D0E3C07ED007</td>
<td style="background-color:#89A7FF; padding:5px;">0000CCBA30</td>
<td style="background-color:#B7FF93; padding:5px;">0000114460</td>
</tr>
</table>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 1 &#8211; Breakdown of Example PR_CONVERSATION_INDEX</p>
<p>After the zero padding, the header timestamp value becomes 01CDE90ABFE0<b>0000</b>. This value is expressed in FILETIME units, and therefore represents the number of 100-nanosecond units since the start of January 1, 1601. When  0x01CDE90ABFE00000 is converted to decimal, we find that the timestamp represents 130,016,196,641,685,504 100-nanosecond units since the start of January 1, 1601, which corresponds to January 2, 2013 17:01:04 (UTC). This matches the sent date (PR_CLIENT_SUBMIT_TIME) of the original e-mail in the e-mail thread.</p>
<p>The following 16-bytes represent the following GUID: d78f0e42-8082-4120-b2f1-d0e3c07ed007.</p>
<p>The following 5 bytes, 0000CCBA30, contain the data for the first child block. When we convert 0x0000CCBA30 to binary, we find the following 40-bit value:</p>
<p> 0000000000000000110011001011101000110000.
<p>According to Microsoft&#8217;s documentation, the bits are divided as follows:</p>
<table style="border: solid 1px #666666;">
<tr style="border-bottom: solid 1px #666666;font-weight:bold;">
<td style="background-color:#D1B5FF; padding:5px;">Code Bit (1&nbsp;bit)</td>
<td style="background-color:orange; padding:5px;">Time Difference (31 bits)</td>
<td style="background-color:#89A7FF; padding:5px;">Random Number (4 bits)</td>
<td style="background-color:#B7FF93; padding:5px;">Sequence Count (4 bits)</td>
</tr>
<tr>
<td style="background-color:#D1B5FF; padding:5px;">0</td>
<td style="background-color:orange; padding:5px;">0000000000000001100110010111010</td>
<td style="background-color:#89A7FF; padding:5px;">0011</td>
<td style="background-color:#B7FF93; padding:5px;">0000</td>
</tr>
</table>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 2 &#8211; Breakdown of Example Child Block</p>
<p>Since the first bit is 0, we assume that the high 15 bits and the low 18 bits were discarded when the time difference value was being calculated. Once the discarded digits are added, the FILETIME value becomes the following 64-bit binary string:<br />
<b>000000000000000</b>0000000000000001100110010111010<b>000000000000000000</b></p>
<p>When converted to decimal, this value represents 13,738,967,040 100-nanosecond units, which is approximately a 22 minute and 53.897 second time span. When added to the header date, we find that the date of Child Block 1 was January 2, 2013 17:23:58 (UTC). After converting the random number and sequence count bits to decimal, we find that the values are 3 and 0 respectively.</p>
<p>Same calculations on Child Block 2 indicate that the time difference for this message is 1 minute and 55.868 seconds and the random number and sequence count values are 6 and 0 respectively.</p>
<p>The final results are as follows:</p>
<table>
<tr>
<td><b>Header Date</b></td>
<td>01/02/2013 17:01:04 (UTC)</td>
</tr>
<tr>
<td><b>GUID</b></td>
<td>d78f0e42-8082-4120-b2f1-d0e3c07ed007</td>
</tr>
<tr colspan="2">
<td style="padding-top:10px;padding-left:10px;"><b><u>Child Message 1</u></b></td>
</tr>
<tr>
<td style="padding-left:10px;"><b>Message Date</b></td>
<td style="padding-left:10px;">01/02/2013 17:23:58 (UTC)</td>
</tr>
<tr>
<td style="padding-left:10px;"><b>Random No</b></td>
<td style="padding-left:10px;">3</td>
</tr>
<tr>
<td style="padding-left:10px;"><b>Sequence Count</b></td>
<td style="padding-left:10px;">0</td>
</tr>
<tr colspan="2">
<td style="padding-top:10px;padding-left:20px;"><b><u>Child Message 2</u></b></td>
</tr>
<tr>
<td style="padding-left:20px;"><b>Message Date</b></td>
<td style="padding-left:20px;">01/02/2013 17:25:53 (UTC)</td>
</tr>
<tr>
<td style="padding-left:20px;"><b>Random No</b></td>
<td style="padding-left:20px;">6</td>
</tr>
<tr>
<td style="padding-left:20px;"><b>Sequence Count</b></td>
<td style="padding-left:20px;">0</td>
</tr>
</table>
<h4>Download Conversation Index Parser</h4>
<p>We realize that performing these calculations on multiple messages can be quite tedious. We have created a free utility called Conversation Index Parser which extracts the information from the PR_CONVERSATION_INDEX value. If you would like to download a copy, please leave us your e-mail address below and we will get back to you with the download link.</p>
<p><a href="http://www.meridiandiscovery.com/base/wp-content/uploads/2013/01/ConversationIndexParser.png"><img width="220" height="150" alt="Conversation Index Parser v02" src="http://www.meridiandiscovery.com/base/wp-content/themes/meridian/cache/images/ConversationIndexParser-220x150.png" /></a></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 3 &#8211; Conversation Index Parser v02 Screenshot</p>
<p>[contact-form-7]</p>
<h3>Observations</h3>
<p>We have made the following observations while analyzing the PR_CONVERSATION_INDEX properties of numerous e-mails:</p>
<ul class="list1 list_color_blue">
<li>When creating a new message, Outlook populates the header date in PR_CONVERSATION_INDEX with the local timestamp when the message is sent (i.e. PR_CLIENT_SUBMIT_TIME).</li>
<li>When replying to or forwarding a message, Outlook updates the PR_CONVERSATION_INDEX property and sets the time difference in the child block based on when the new message is created, not when it is sent. For example, let&#8217;s assume that person A receives an e-mail from person B and hits the reply button in Outlook at 3:00:00 PM. She then takes 10 minutes to compose her answer and send the e-mail. The time difference value contained in PR_CONVERSATION_INDEX would reflect the difference in time between 3:00:00 PM (when person A created the new message) and the timestamp of the previous message.</li>
<li>Outlook calculates the time difference values based on the user&#8217;s local time. When multiple users with slightly different computer times participate in an e-mail conversation, the calculated time difference values in PR_CONVERSATION_INDEX would reflect this discrepancy. For example, let&#8217;s assume that person C creates and sends a message to person D at precisely 4:00:00 PM. Person D&#8217;s computer time is 5 minutes ahead, and shows 4:05:00 PM at this moment. Person D creates a response message at 4:20:00 PM according to his computer (4:15:00 PM according to Person C&#8217;s computer). This would cause Person D&#8217;s Outlook to calculate a 20 minute time difference value when, in reality, only 15 minutes passed between the two events.</li>
<li>The time difference values found in the child blocks are unsigned. While one would normally assume that the time differences in an e-mail conversation would be positive (i.e. a response would be created after the original message), it is possible that the original time difference might have been negative due to a large difference in local computer times or due to tampering. For example, let&#8217;s assume that person E sends person F a message at 5:00:00 PM, and person F&#8217;s computer time was set to 4:50:00 PM at this moment (10 minutes behind that of person E). If person F replies to this message at 4:54:00 PM according to his computer (5:04:00 PM according to person E&#8217;s computer), person F&#8217;s Outlook would record a time difference value of 6 minutes while the difference (according to the local computer times) was -6 minutes.
</ul>
<h3>Conclusions</h3>
<p>Combined with additional evidence from the e-mail server or internal e-mail metadata, the information contained in the PR_CONVERSATION_INDEX MAPI property can be very helpful in the forensic analysis of e-mails. We believe that forensic examiners would benefit from understanding how and when this property is populated and which factors (such as the accuracy of the local computer time) affect the reliability of this information.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/how-to/e-mail-conversation-index-metadata-computer-forensics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why Windows Sorts Numeric File Names Differently</title>
		<link>http://www.meridiandiscovery.com/how-to/why-windows-sorts-numeric-file-names-differently/</link>
		<comments>http://www.meridiandiscovery.com/how-to/why-windows-sorts-numeric-file-names-differently/#comments</comments>
		<pubDate>Wed, 15 Aug 2012 21:40:35 +0000</pubDate>
		<dc:creator>Arman Gungor</dc:creator>
				<category><![CDATA[How-to]]></category>
		<category><![CDATA[Computer Forensics]]></category>
		<category><![CDATA[e-Discovery]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/?p=2681</guid>
		<description><![CDATA[File names are stored as strings in almost every operating system and database management system. While this works well in most cases, it causes files with names containing numerals to be sorted counter intuitively. For example, contents of a folder containing 7 files with numeric suffixes would ordinarily look as follows: Exhibit1.pdf Exhibit10.pdf Exhibit15.pdf Exhibit2.pdf ...]]></description>
			<content:encoded><![CDATA[<p>File names are stored as strings in almost every operating system and database management system. While this works well in most cases, it causes files with names containing numerals to be sorted counter intuitively. For example, contents of a folder containing 7 files with numeric suffixes would ordinarily look as follows:<br />
<center></p>
<div style="width: 100px; background: #000; padding:30px; font-size: 9pt; font-family: fixedsys, LucidaTerminal, monospace; color: #909090; text-align: left; overflow:auto; border: 5px solid #909090;">
Exhibit1.pdf<br />
Exhibit10.pdf<br />
Exhibit15.pdf<br />
Exhibit2.pdf<br />
Exhibit21.pdf<br />
Exhibit3.pdf<br />
Exhibit4.pdf
</div>
<p></center></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 1 &#8211; Files Sorted Alphabetically</p>
<p>In scenarios where the order of the files is crucial (e.g. in the legal industry), end users typically pad the file names with zeros so that they are ordered correctly when sorted alphabetically. For example:
</p>
<p><center></p>
<div style="width: 100px; background: #000; padding:30px; font-size: 9pt; font-family: fixedsys, LucidaTerminal, monospace; color: #909090; text-align: left; overflow:auto; border: 5px solid #909090;">
Exhibit001.pdf<br />
Exhibit002.pdf<br />
Exhibit003.pdf<br />
Exhibit004.pdf<br />
Exhibit010.pdf<br />
Exhibit015.pdf<br />
Exhibit021.pdf
</div>
<p></center></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 2 &#8211; Files with Zero Padding</p>
<p>The Shell team at Microsoft at some point decided to improve things a bit and implemented a new way of comparing Unicode strings that contain numerals (see <a href="http://http://msdn.microsoft.com/en-us/library/windows/desktop/bb759947%28v=vs.85%29.aspx" title="StrCmpLogicalW Function" target="_blank" rel="nofollow">StrCmpLogicalW</a>). The change took effect after Windows 2000, so operating systems such as Windows Server 2003, Windows XP, Windows Vista and Windows 7 sort numerals in folder and file names according to their numeric value. For example, our example folder would look as follows in Windows XP:</p>
<p><a href="http://www.meridiandiscovery.com/base/wp-content/uploads/2012/08/NaturalSort.png"><img width="370" height="139" alt="Windows Natural Numeric Sort" src="http://www.meridiandiscovery.com/base/wp-content/themes/meridian/cache/images/NaturalSort-370x139.png" /></a></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 3 &#8211; Windows XP Natural Numeric Sort</p>
<h3>Associated Issues</h3>
<p>While this seems logical and may be helpful to most people, we believe that it brings new issues, especially in the legal industry. </p>
<h4>1. Compatibility with e-Discovery and Computer Forensics Software:</h4>
<p>Imagine a lawyer organizing exhibits to be processed to TIFF, endorsed and produced. Looking at the files in Windows Explorer, he would naturally assume that the files would be processed in the order as he sees them on his computer. However, computer forensics and e-discovery tools do not implement Microsoft&#8217;s sort algorithm, and treat the file and folder names as strings while sorting. Consequently, files would be processed and numbered in a different order than what the attorney had anticipated. Had Windows sorted the files without any special handling, the attorney or litigation support team would have noticed the incorrect sort order and compensated for it by correctly padding the file names or applying a custom sort order.</p>
<h4>2. Consistency within the Operating System:</h4>
<p>Even though Windows Explorer takes advantage of the StrCmpLogicalW API and sorts files and folders with names containing numerals in a logical manner, other areas of the operating system (such as the command line interface) still use the traditional sort method, causing inconsistencies in the way files are displayed in different parts of the same operating system. Please see Figure 5 below for a comparison of how Windows Explorer and the Command Line Interface (CLI) display the same set of files.</p>
<p><a href="http://www.meridiandiscovery.com/base/wp-content/uploads/2012/08/Windows_v_Dos.png"><img width="407" height="85" alt="Files As Displayed by Windows vs CLI on The Same Computer" src="http://www.meridiandiscovery.com/base/wp-content/themes/meridian/cache/images/Windows_v_Dos-407x85.png" /></a></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 4 &#8211; Files As Displayed by Windows vs. CLI on The Same Computer</p>
<h4>3. Consistency among Operating Systems:</h4>
<p>Microsoft&#8217;s proprietary sort algorithm does not match how files are displayed in other operating systems such as Linux and Mac OS. Furthermore, Microsoft has changed the StrCmpLogicalW API in different versions of its operating systems such as Windows XP, Windows Vista and Windows 7. Consequently, the way files are displayed in Windows Explorer varies slightly among Microsoft&#8217;s own operating systems.</p>
<h3>Potential Solution</h3>
<p>Luckily, starting with Windows XP SP-1, Microsoft has made available a registry key that can suppress the use of StrCmpLogicalW API and revert Windows Explorer to treating file names as strings. The registry key is as follows:</p>
<code class="code">HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Currentversion\Policies\Explorer\NoStrCmpLogical</code>
<p>The value of the NoStrCmpLogical (DWORD) key should be set to 1 to prevent Windows XP and later versions from sorting numerals in folder and file names according to their numeric value. The <a href="http://support.microsoft.com/kb/319827" rel="nofollow" taget="_blank">Microsoft Support Website</a> provides additional details about this issue. Please note that the above change requires a restart or log off to take effect. Remember to back-up your registry before making any changes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/how-to/why-windows-sorts-numeric-file-names-differently/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Exception Handling and Reporting in e-Discovery</title>
		<link>http://www.meridiandiscovery.com/articles/exception-handling-and-reporting-in-e-discovery/</link>
		<comments>http://www.meridiandiscovery.com/articles/exception-handling-and-reporting-in-e-discovery/#comments</comments>
		<pubDate>Tue, 07 Aug 2012 18:59:04 +0000</pubDate>
		<dc:creator>Arman Gungor</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[e-Discovery]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/?p=2500</guid>
		<description><![CDATA[In a perfect world, e-Discovery would be as simple as pointing your software at the data source, kicking back and waiting for all documents to be ingested and processed with 100% accuracy. However, in the real world, e-Discovery involves dealing with thousands of file types, some of which are very complex and cannot be automatically ...]]></description>
			<content:encoded><![CDATA[<p>In a perfect world, e-Discovery would be as simple as pointing your software at the data source, kicking back and waiting for all documents to be ingested and processed with 100% accuracy. However, in the real world, e-Discovery involves dealing with thousands of file types, some of which are very complex and cannot be automatically handled by even the most sophisticated e-Discovery platforms. Consequently, being able to perform defensible e-Discovery requires the close supervision of experienced e-Discovery experts and a well-thought-out exception handling policy.</p>
<h3>What are Exceptions?</h3>
<p>e-Discovery exceptions are documents that cannot be correctly processed by the e-Discovery platform. For example, the e-Discovery software may be unable to open, extract text or metadata from, or image a document. Exceptions can be encountered during various stages of the e-Discovery process such as collection, processing, review and production. For the purposes of this article, we will focus on exceptions in the processing stage of e-Discovery.</p>
<h3>Types of Exceptions</h3>
<h4>A. Corrupt Files</h4>
<p>Corrupt files are files that have structural problems which prevent them from being opened or manipulated in even their native application. File corruption can be caused by numerous factors such as network transmission errors, errors in the medium where files were stored (e.g. bad sectors on a hard drive) or unexpected termination of the software that was being used to edit the file (e.g. a power failure).</p>
<p>When handling corrupt file exceptions, the first course of action usually is to investigate the possibility of obtaining a replacement. If a replacement copy is not available, depending on the nature of the case and how critical the corrupt file is, attempting to repair the file may be a viable option (e.g. recovering a corrupt mailbox). Alternatively, the corrupt file can be excluded from processing and delivered in native format. In any case, the exception should be logged and all steps taken should be thoroughly documented.</p>
<h4>B. Unprocessable Files</h4>
<p>Unprocessable files are files that do not support the common e-Discovery actions such as text and metadata extraction or conversion to image format. For example, system files such as executables and dynamic link libraries are typically unprocessable file types.</p>
<h4>C. Unsupported Files (Processable Files That Are Not Supported by The e-Discovery Platform)</h4>
<p>No e-Discovery software supports all processable file types. An e-Discovery project may contain unsupported files that could be searched or converted to an image format using external software. Some examples would be advanced CAD/CAM files such as Unigraphics, less popular archive container formats such as ARJ and ACE, some database formats such as SQLite, project management files such as Primavera files and BlackBerry back-up files etc.</p>
<p>Depending on the type and amount of unsupported files, these files may be manually processed or the e-Discovery service provider can develop a custom solution to handle the files in an automated manner. Some complex file types such as technical drawings and databases can be processed and produced in native format if the discovery agreement allows.</p>
<h4>D. Audio &#038; Video Files</h4>
<p>Audio and video files cannot be directly processed by most e-Discovery software, but depending on the case, they can be a very good source of electronic evidence. For example, voicemail messages sent as e-mail attachments by the VoIP phone system in a corporation may contain information not found anywhere else in the organization.</p>
<p>In some cases, the legal team may choose to perform audio discovery on the audio and video files to make them searchable. Alternatively, the files can be delivered in native format so that the reviewers can listen to the audio recordings or watch the videos during review.</p>
<h4>E. Encrypted Files</h4>
<p>Encrypted files are files that were protected by a password, via  digital rights management (DRM) or other encryption schemes. Encrypted files can be single documents such as Ms Office files or PDFs, or encrypted containers such as TrueCrypt volumes.</p>
<p>The legal team should be informed of any encrypted files before further action is taken. Depending on the case, attorneys may choose to exclude the encrypted files from processing due to privacy concerns. On the other hand, in some cases, encrypted files may be of particular interest.</p>
<p>Attorneys may occasionally be able to obtain the passwords for the encrypted files in the data set. If passwords are not available, they can often be discovered by strategically reviewing neighbor documents or by attempting to crack the passwords.</p>
<h3>How Should Exceptions be Tracked, Handled and Reported?</h3>
<h4>A. e-Discovery Software</h4>
<p>A well-designed e-Discovery software should provide the following mechanisms for exception tracking, handling and reporting:</p>
<ul class="list2 list_color_blue">
<li>All encountered exceptions should be logged and displayed to the technician conspicuously. The log files should contain detailed information about the exceptions such as the full file path, file name, hash value and a description of the exception.</li>
<li>The e-Discovery technician should be able to manually process documents that cannot be automatically processed. Large amounts of unsupported file types should be able to be batch processed using the native application (e.g. Shell Print).</li>
<li>The e-Discovery software should be able to complete the processing of exceptions after the fact. For example, if the password of an encrypted container is discovered after the processing job was completed, or a replacement for a corrupt mailbox was obtained, the e-Discovery software should be able to add the new extracted documents to the data set where they belong, without having to re-process the documents.</li>
<li>The e-Discovery software should have built-in mechanisms that facilitate the tracking and review of the exceptions during quality control. The technician should be able to add his comments for each file. This information can then be exported as part of the exceptions report.</li>
<li>The e-Discovery software should have built-in password cracking functionality and should allow the technician to input a list of known passwords to be used for opening and processing encrypted files.</li>
</ul>
<h4>B. The e-Discovery Deliverable</h4>
<p>In our experience, most legal teams prefer to leave processing exceptions in the review database as placeholders with native hyperlinks. The placeholder usually contains a few fields of metadata for the exception such as the file name, file path and hash value. This helps them see the exceptions in context, and be able to look at them in more detail if they wish to do so. If the exceptions are left in the database as placeholders, those records should be identified (e.g. with a tag in the database or by populating a field) so that they can easily be excluded from a subsequent production if required.</p>
<p>On the other hand, if the legal team would like to have the exceptions excluded from the review database, the exceptions should be exported separately and delivered in native format (e.g. in a folder called &#8220;\EXCEPTIONS&#8221;) along with the exceptions report as part of the deliverable. This would ensure that someone looking at the deliverable in the future can easily locate and review the exceptions associated with that deliverable.</p>
<h4>C. The Exceptions Report</h4>
<p>The exceptions report is an important part of the documentation of the ESI lifecycle. At a minimum, an exceptions report should contain the following elements:</p>
<ul class="list2 list_color_blue">
<li>Relevant information about the case and project so that, if the report is reviewed separately from the deliverable, the viewer can still identify which project/batch the report pertains to.</li>
<li>Identifying information (e.g. BegDoc #, internal Doc ID) about each exception so that the entries on the exception report can be linked to the placeholders in the database or native files in the &#8220;\EXCEPTIONS&#8221; folder.</li>
<li>Technician&#8217;s comments and description of why the file is an exception.</li>
<li>Crucial file metadata such as file name, file path, file size, file extension, hash value etc.</li>
</ul>
<h3>Conclusion</h3>
<p>Processing exceptions are unavoidable in e-Discovery. We believe that being able to accurately identify exceptions, thoroughly documenting every step of action, and having a well-thought-out exception handling plan are essential components of an effective and defensible e-Discovery process. Legal teams should not only discuss exception handling policies with their in-house litigation support team and outside service providers, but also with the opposing counsel during &#8220;Meet and Confer&#8221;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/articles/exception-handling-and-reporting-in-e-discovery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Image Normalization in e-Discovery Processing</title>
		<link>http://www.meridiandiscovery.com/articles/image-normalization-in-e-discovery-processing/</link>
		<comments>http://www.meridiandiscovery.com/articles/image-normalization-in-e-discovery-processing/#comments</comments>
		<pubDate>Tue, 26 Jun 2012 22:18:56 +0000</pubDate>
		<dc:creator>Arman Gungor</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[e-Discovery]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/?p=2389</guid>
		<description><![CDATA[Even though most e-Discovery projects involve image output (TIFF, JPG, PDF etc.), we find that the specifications of the output images are rarely discussed thoroughly. An important detail, which is usually omitted from e-Discovery processing specifications, is whether or not output images should be normalized. Image normalization (in the e-Discovery sense) is the process of ...]]></description>
			<content:encoded><![CDATA[<p>Even though most e-Discovery projects involve image output (TIFF, JPG, PDF etc.), we find that the specifications of the output images are rarely discussed thoroughly. An important detail, which is usually omitted from e-Discovery processing specifications, is whether or not output images should be normalized.</p>
<p>Image normalization (in the e-Discovery sense) is the process of transforming images to make them consistent in terms of dimensions, resolution, color depth and orientation. For example, larger images can be resized to 8.5&#8243;x11&#8243;, landscape pages can be rotated to portrait, images with different resolutions can be converted to 300 DPI etc.</p>
<p>In our experience, the most frequently used image specifications in e-Discovery processing today are 8.5&#8243;x11&#8243; portrait, monochrome images with 300 DPI resolution and CCITT Group 4 compression. Landscape images are typically rotated 90&deg; counter-clockwise to portrait. When color for color processing is performed, color images are usually exported as 300 DPI JPGs. See Figure 1 below for an example landscape page that was normalized and endorsed.</p>
<p><a href="http://www.meridiandiscovery.com/base/wp-content/uploads/2012/06/LandscapeEndorsedImage.png"><img width="220" height="150" alt="Normalized and Endorsed Landscape Image" src="http://www.meridiandiscovery.com/base/wp-content/themes/meridian/cache/images/LandscapeEndorsedImage-220x150.png" /></a></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 1 &#8211; Normalized and Endorsed Landscape Image</p>
<h3>Advantages and Disadvantages of Image Normalization</h3>
<p>Whether or not image normalization is required should be decided on a case by case basis. Some of the advantages and disadvantages of normalization are as follows:</p>
<h4>Advantages:</h4>
<ul class="list1 list_color_blue">
<li>If images are normalized before endorsement, the size and location of the endorsements would be consistent among different pages in the data set. See Figure 2 for an example document processed using different page sizes and endorsed without normalization.</li>
<p><a href="http://www.meridiandiscovery.com/base/wp-content/uploads/2012/06/EndorsementSizeDifference.png"><img width="220" height="150" alt="Endorsement Size Difference without Normalization" src="http://www.meridiandiscovery.com/base/wp-content/themes/meridian/cache/images/EndorsementSizeDifference-220x150.png" /></a></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 2 &#8211; Endorsement Size Difference without Normalization</p>
<li>Endorsing normalized images ensures that each page has the same amount of space for the Bates endorsement and confidentiality legend. Having images of various sizes can result in unexpected overlaps in endorsements on the smaller pages. See Figure 3 below for an example.</li>
<p><a href="http://www.meridiandiscovery.com/base/wp-content/uploads/2012/06/OverlappingEndorsements.png"><img width="220" height="150" alt="Overlapping Endorsements" src="http://www.meridiandiscovery.com/base/wp-content/themes/meridian/cache/images/OverlappingEndorsements-220x150.png" /></a></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 3 &#8211; Overlapping Endorsements</p>
<li>If images are printed, using normalized images would prevent printing problems due to changes in page size and orientation. For example, some printing applications do not automatically rotate landscape pages and print them on a portrait page scaled down. See Figure 4 below for an example landscape page printed on a portrait canvas.</li>
<p><a href="http://www.meridiandiscovery.com/base/wp-content/uploads/2012/06/LandscapePage.png"><img width="220" height="150" alt="Landscape Page Printed on Portrait Canvas" src="http://www.meridiandiscovery.com/base/wp-content/themes/meridian/cache/images/LandscapePage-220x150.png" /></a> </p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 4 &#8211; Landscape Page Printed on Portrait Canvas</p>
</ul>
<h4>Disadvantages:</h4>
<ul class="list1 list_color_blue">
<li>Image normalization without proper quality control can result in legibility problems. For example, a 36&#8243;x24&#8243;, 600 DPI technical drawing would most likely become illegible if normalized to 8.5&#8243;x11&#8243; 300 DPI.</li>
<li>In some review platforms, landscape pages rotated to portrait may require the reviewers to either rotate the page or turn their heads sideways. This can be prevented by exporting a rotation flag along with the image reference file so that the review platform auto-rotates the pages to correct page orientation for review (if supported by the review platform).</li>
<li>Using poorly designed normalization software can result in degradation of overall image quality.</li>
<li>Image normalization can be a time consuming process and can add a significant amount of time to the e-Discovery export process in large cases.</li>
</ul>
<h3>How Normalization Should Be Performed</h3>
<p>Image normalization should be performed with minimal impact to the original image quality. A few things to watch out for:</p>
<ul class="list1 list_color_blue">
<li>Images smaller than what the normalization specifications require should typically be placed on a white canvas instead of being enlarged. Enlarging small images usually causes pixelation. See Figure 5 below for an example small image resized two different ways.</li>
<p><a href="http://www.meridiandiscovery.com/base/wp-content/uploads/2012/06/ResizedSmallImage.jpg"><img width="220" height="150" alt="Resized Small Image" src="http://www.meridiandiscovery.com/base/wp-content/themes/meridian/cache/images/ResizedSmallImage-220x150.jpg" /></a> </p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 5 &#8211; Resized Small Image</p>
<li>Aspect ratio of the images should be preserved during normalization. For example, a 14&#8243;x8.5&#8243; page can be resized to an 11&#8243;x6.68&#8243; image, placed on an 11&#8243;x8.5&#8243; canvas and then rotated instead of being resized directly. Resizing without preserving the aspect ratio can result in stretched, distorted images.</li>
<li>If color images are being converted to B&#038;W, proper dithering should be employed (e.g. the Floyd–Steinberg algorithm) to ensure that the B&#038;W image is an acceptable representation of the original color image. See Figure 6 below for a color image converted to B&#038;W with and without proper dithering.</li>
<p><a href="http://www.meridiandiscovery.com/base/wp-content/uploads/2012/06/BW_Sample.jpg"><img width="220" height="150" alt="B&#038;W Conversion and Dithering" src="http://www.meridiandiscovery.com/base/wp-content/themes/meridian/cache/images/BW_Sample-220x150.jpg" /></a></p>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Figure 6 &#8211; B&#038;W Conversion and Dithering</p>
</ul>
<h3>Conclusion</h3>
<p>We believe that complete output image specifications should be included with every e-Discovery processing project. Legal teams should be aware of the advantages and disadvantages of image normalization and should make informed decisions on a case by case basis. In some cases, requesting the review database without normalization and normalizing images later before endorsement &#038; production or blowbacks may be a good approach.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/articles/image-normalization-in-e-discovery-processing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Handling Deleted E-mail Messages during e-Discovery Processing</title>
		<link>http://www.meridiandiscovery.com/articles/handling-deleted-e-mail-messages-during-e-discovery-processing/</link>
		<comments>http://www.meridiandiscovery.com/articles/handling-deleted-e-mail-messages-during-e-discovery-processing/#comments</comments>
		<pubDate>Tue, 05 Jun 2012 22:02:35 +0000</pubDate>
		<dc:creator>Arman Gungor</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[e-Discovery]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/?p=2320</guid>
		<description><![CDATA[Most mailboxes contain both active and deleted e-mail messages. By &#8220;deleted e-mail messages&#8221;, I am referring to messages that were permanently deleted. For example, a message that was deleted using SHIFT+Delete in Outlook or a message that was deleted from the &#8220;Deleted Items&#8221; folder. In some e-mail platforms, deleted messages are not immediately purged and ...]]></description>
			<content:encoded><![CDATA[<p>Most mailboxes contain both active and deleted e-mail messages. By &#8220;deleted e-mail messages&#8221;, I am referring to messages that were permanently deleted. For example, a message that was deleted using SHIFT+Delete in Outlook or a message that was deleted from the &#8220;Deleted Items&#8221; folder. In some e-mail platforms, deleted messages are not immediately purged and can easily be recovered. For example, Ms Outlook does not purge deleted e-mail messages from a Personal Storage Table (PST) file until the PST is compacted.</p>
<p>When it comes to handling deleted e-mail messages, e-Discovery processing software typically fall into two camps:</p>
<ul class="list1 list_color_blue">
<li>
<h4>Software Solutions That Only Process Active E-mails:</h4>
<p>Some e-Discovery processing software solutions only extract and process active e-mails. For example, products that use Messaging Application Programming Interface (MAPI) to access Ms Outlook e-mails would typically only retrieve and process active e-mails and disregard permanently deleted (but not yet purged) messages. Please note that these software tools still process the contents of the &#8220;Deleted Items&#8221; folder since e-mails in this folder are not yet permanently deleted.
</li>
<li>
<h4>Software Solutions That Process Both Active and Deleted E-mails:</h4>
<p>Some e-Discovery processing products process both active and permanently deleted (but not yet purged) e-mail messages by default. These tools usually employ their own processes to parse mailboxes.
</li>
</ul>
<p>As you can imagine, it is crucial to have a good understanding of how the e-Discovery software or service provider that you use handles deleted e-mails. Depending on the case, it may or may not be desirable to process deleted items. Consider the following example scenarios:</p>
<ul class="list1 list_color_blue">
<li>Opposing counsel produces a PST containing responsive e-mails. As it turns out, they conducted a manual review of the mailbox within Outlook (bad idea), deleted e-mails that were privileged or non-responsive and produced the PST file without compacting it. In this scenario, whether or not deleted e-mails are included in e-Discovery processing would determine if the inadvertently produced privileged and non-responsive e-mails would make their way into your review database.
</li>
<li>Similar to the first example above, an attorney from your firm opened and reviewed a PST file within Outlook without consulting with the litigation support department (again, bad idea). He deleted the e-mails that were privileged and gave you the PST for processing and production. Unless the processed data set is reviewed one more time by the attorney before production, processing deleted e-mails in this scenario can result in inadvertent production of privileged documents.
</li>
<li>Mailboxes of multiple custodians were forensically collected from each custodian&#8217;s local computer. Some of the mailboxes contain deleted e-mails that are relevant to the case. In this scenario, you could miss critical information unless deleted e-mails were included during e-Discovery processing.</li>
</ul>
<h3>Conclusion</h3>
<ul class="list2 list_color_blue">
<li>It is critical to know how your e-Discovery software or service provider handles deleted e-mails. Lack of this information can result in missing critical documents or inadvertent production of privileged information.</li>
<li>If possible, it is recommended to use software that provides the flexibility to specify how you would like to handle deleted e-mails on a case by case basis. Otherwise, you may find yourself looking for a different solution depending on case requirements.</li>
<li>In-house litigation support teams as well as outside service providers should pay attention to mailboxes that produce much less content than their size implies. A big discrepancy usually points to a large amount of deleted e-mails that were not purged.</li>
<li>When working with e-Discovery service providers, make sure to indicate your preference regarding how deleted e-mails should be handled.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/articles/handling-deleted-e-mail-messages-during-e-discovery-processing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fields You Should Have in Your e-Discovery Database</title>
		<link>http://www.meridiandiscovery.com/articles/fields-you-should-have-in-your-e-discovery-database/</link>
		<comments>http://www.meridiandiscovery.com/articles/fields-you-should-have-in-your-e-discovery-database/#comments</comments>
		<pubDate>Thu, 24 May 2012 20:25:08 +0000</pubDate>
		<dc:creator>Arman Gungor</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[e-Discovery]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/?p=2214</guid>
		<description><![CDATA[Modern e-Discovery software can extract hundreds of metadata fields from documents. Extracted metadata is typically stored in a back-end database and a subset of it is exported and included in the e-Discovery production or review database. We often receive questions regarding which metadata fields should be included in an e-Discovery review database or which metadata ...]]></description>
			<content:encoded><![CDATA[<p>Modern e-Discovery software can extract hundreds of metadata fields from documents. Extracted metadata is typically stored in a back-end database and a subset of it is exported and included in the e-Discovery production or review database. We often receive questions regarding which metadata fields should be included in an e-Discovery review database or which metadata fields should be requested during an electronic document production. </p>
<p>The answers to these questions depend on the requirements of each case and should ultimately be determined by the legal team. That said, we have prepared the following field list as an example, with the hope that it will serve as a good starting point. Please note the following:</p>
<ul class="list1 list_color_blue">
<li>The list below is for electronic documents only, and does not include fields for scanned paper documents. If you keep scanned paper documents and electronic documents together, you should add to the list paper specific fields such as BOXNO, BOXNAME, FOLDERNAME etc.</li>
<li>Depending on the legal review platform, e-Discovery databases can also contain several additional administrative and/or user populated fields</li>
<li>We recommend that the time fields be populated using 24-hour format with second precision (e.g. 23:14:05)</li>
</ul>
<p>Feel free to contact us with any questions and share your thoughts and suggestions in the comments.</p>
<h4>Download:</h4>
<p>Download Concordance style header row below to create a Concordance database using our <a href="http://www.meridiandiscovery.com/software/concordance-cpl-to-create-database-dcb-from-load-file/" title="Concordance CPL to Create Database (DCB) from Load File" target="_blank">Create_DCB CPL</a>:</p>
<p><i><span class="icon_text icon_download blue"><a href="http://www.meridiandiscovery.com/downloads/Meridian+Discovery+Fields+Header+Row+v1">Meridian Discovery Fields Header Row v1 (Version 1)</a></span></i></p>
<table id="newspaper-b">
<thead>
<tr>
<th scope="col"><b>Field Name</b></th>
<th scope="col">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><b>INPUTID</b></td>
<td>Internal document identifier</td>
</tr>
<tr>
<td><b>DOCID</b></td>
<td>Document ID for native-only databases</td>
</tr>
<tr>
<td><b>PARENTID</b></td>
<td>Parent document ID for native-only databases</td>
</tr>
<tr>
<td><b>BEGDOC</b></td>
<td>Beginning Bates number</td>
</tr>
<tr>
<td><b>ENDDOC</b></td>
<td>Ending Bates number</td>
</tr>
<tr>
<td><b>BEGATTACH</b></td>
<td>Beginning Bates number of the attachment family</td>
</tr>
<tr>
<td><b>ENDATTACH</b></td>
<td>Ending Bates number of the attachment family</td>
</tr>
<tr>
<td><b>PRODBEG</b></td>
<td>Beginning production Bates number</td>
</tr>
<tr>
<td><b>PRODEND</b></td>
<td>Ending production Bates number</td>
</tr>
<tr>
<td><b>PRODDATE</b></td>
<td>Date document was produced</td>
</tr>
<tr>
<td><b>PRODVOL</b></td>
<td>Production volume ID</td>
</tr>
<tr>
<td><b>BATES_RANGE</b></td>
<td>Bates range of the document</td>
</tr>
<tr>
<td><b>ATTACH_RANGE</b></td>
<td>Attachment range of the document family</td>
</tr>
<tr>
<td><b>ATTACH_IDS</b></td>
<td>IDs of documents attached to this document</td>
</tr>
<tr>
<td><b>ATTACH_CNT</b></td>
<td>Number of attachments indicated in document metadata</td>
</tr>
<tr>
<td><b>EXT_ATT_CNT</b></td>
<td>Number of attachments actually extracted</td>
</tr>
<tr>
<td><b>PGCOUNT</b></td>
<td>Page count</td>
</tr>
<tr>
<td><b>VOLUME_ID</b></td>
<td>Deliverable volume ID</td>
</tr>
<tr>
<td><b>CUSTODIAN</b></td>
<td>Custodian name</td>
</tr>
<tr>
<td><b>SOURCE</b></td>
<td>Source of processed ESI</td>
</tr>
<tr>
<td><b>DOC_TYPE</b></td>
<td>Document type</td>
</tr>
<tr>
<td><b>EMAIL_HEADER</b></td>
<td>Header information for e-mail message</td>
</tr>
<tr>
<td><b>EMAIL_BODY</b></td>
<td>The message body of e-mail message</td>
</tr>
<tr>
<td><b>EMAIL_FROM</b></td>
<td>E-mail author</td>
</tr>
<tr>
<td><b>EMAIL_RECIP</b></td>
<td>E-mail recipients</td>
</tr>
<tr>
<td><b>EMAIL_CC</b></td>
<td>E-mail carbon copy recipients</td>
</tr>
<tr>
<td><b>EMAIL_BCC</b></td>
<td>E-mail blind carbon copy recipients</td>
</tr>
<tr>
<td><b>EMAIL_SUBJ</b></td>
<td>E-mail subject</td>
</tr>
<tr>
<td><b>MAIL_FOLDER</b></td>
<td>Mail folder inside a mailbox</td>
</tr>
<tr>
<td><b>MAIL_STORE</b></td>
<td>Name of the mail store</td>
</tr>
<tr>
<td><b>ATTACHMENTS</b></td>
<td>File names of attached documents</td>
</tr>
<tr>
<td><b>IMPORTANCE</b></td>
<td>Importance of e-mail message</td>
</tr>
<tr>
<td><b>MESSAGE_ID</b></td>
<td>Message ID for e-mail</td>
</tr>
<tr>
<td><b>CONV_INDEX</b></td>
<td>E-mail conversation index (PR_CONVERSATION_INDEX)</td>
</tr>
<tr>
<td><b>UNREAD</b></td>
<td>Whether or not the e-mail was read</td>
</tr>
<tr>
<td><b>READ_RECEIPT</b></td>
<td>Whether or not read receipt was requested</td>
</tr>
<tr>
<td><b>INT_MSG_ID</b></td>
<td>Internet Message ID (PR_INTERNET_MESSAGE_ID)</td>
</tr>
<tr>
<td><b>DATE_RECD</b></td>
<td>Date received</td>
</tr>
<tr>
<td><b>TIME_RECD</b></td>
<td>Time received</td>
</tr>
<tr>
<td><b>DATE_SENT</b></td>
<td>Date sent</td>
</tr>
<tr>
<td><b>TIME_SENT</b></td>
<td>Time sent</td>
</tr>
<tr>
<td><b>MASTER_DATE</b></td>
<td>Parent date pushed down to child documents</td>
</tr>
<tr>
<td><b>MASTER_TIME</b></td>
<td>Parent time pushed down to child documents</td>
</tr>
<tr>
<td><b>START_DATE</b></td>
<td>Appointment start date</td>
</tr>
<tr>
<td><b>START_TIME</b></td>
<td>Appointment start time</td>
</tr>
<tr>
<td><b>END_DATE</b></td>
<td>Appointment end date</td>
</tr>
<tr>
<td><b>END_TIME</b></td>
<td>Appointment end time</td>
</tr>
<tr>
<td><b>TIME_ZONE</b></td>
<td>Time zone used during processing</td>
</tr>
<tr>
<td><b>FILE_NAME</b></td>
<td>File name</td>
</tr>
<tr>
<td><b>FILE_PATH</b></td>
<td>File path</td>
</tr>
<tr>
<td><b>FILE_TYPE</b></td>
<td>File type</td>
</tr>
<tr>
<td><b>FILE_EXT</b></td>
<td>File extension</td>
</tr>
<tr>
<td><b>FILE_SIZE</b></td>
<td>File size</td>
</tr>
<tr>
<td><b>NATIVE_LINK</b></td>
<td>Hyperlink to native file</td>
</tr>
<tr>
<td><b>MD5_HASH</b></td>
<td>MD5 Hash Value</td>
</tr>
<tr>
<td><b>SHA_HASH</b></td>
<td>SHA Hash Value</td>
</tr>
<tr>
<td><b>AUTHOR</b></td>
<td>Document author</td>
</tr>
<tr>
<td><b>SUBJECT</b></td>
<td>Document subject</td>
</tr>
<tr>
<td><b>TITLE</b></td>
<td>Document title</td>
</tr>
<tr>
<td><b>CATEGORIES</b></td>
<td>Categories metadata field extracted from document</td>
</tr>
<tr>
<td><b>COMMENTS</b></td>
<td>Comments metadata field extracted from document</td>
</tr>
<tr>
<td><b>KEYWORDS</b></td>
<td>Keywords metadata field extracted from document</td>
</tr>
<tr>
<td><b>COMPANY</b></td>
<td>Company metadata field extracted from document</td>
</tr>
<tr>
<td><b>REVISION</b></td>
<td>Revision # of document</td>
</tr>
<tr>
<td><b>SENSITIVITY</b></td>
<td>Sensitivity metadata field extracted from document</td>
</tr>
<tr>
<td><b>MODIFIED_BY</b></td>
<td>Name of person who last modified the document</td>
</tr>
<tr>
<td><b>SEARCH_HITS</b></td>
<td>Search keywords that made the document responsive (semi-colon delimited)</td>
</tr>
<tr>
<td><b>DOC_DATE</b></td>
<td>Best available date for document (combination of several date fields)</td>
</tr>
<tr>
<td><b>DOC_TIME</b></td>
<td>Best available time for document (combination of several time fields)</td>
</tr>
<tr>
<td><b>DATE_CRTD</b></td>
<td>Document creation date (internal document metadata)</td>
</tr>
<tr>
<td><b>TIME_CRTD</b></td>
<td>Document creation time (internal document metadata)</td>
</tr>
<tr>
<td><b>DATE_LST_PRN</b></td>
<td>Date document was last printed (internal document metadata)</td>
</tr>
<tr>
<td><b>TIME_LST_PRN</b></td>
<td>Time document was last printed (internal document metadata)</td>
</tr>
<tr>
<td><b>DATE_LST_MOD</b></td>
<td>Date document was last modified (internal document metadata)</td>
</tr>
<tr>
<td><b>TIME_LST_MOD</b></td>
<td>Time document was last modified (internal document metadata)</td>
</tr>
<tr>
<td><b>DATE_LST_SVD</b></td>
<td>Date document was last saved (internal document metadata)</td>
</tr>
<tr>
<td><b>TIME_LST_SVD</b></td>
<td>Time document was last saved (internal document metadata)</td>
</tr>
<tr>
<td><b>FS_DATE_CRTD</b></td>
<td>Document creation date (file system metadata)</td>
</tr>
<tr>
<td><b>FS_TIME_CRTD</b></td>
<td>Document creation time (file system metadata)</td>
</tr>
<tr>
<td><b>FS_DATE_MOD</b></td>
<td>Date document was last modified (file system metadata)</td>
</tr>
<tr>
<td><b>FS_TIME_MOD</b></td>
<td>Time document was last modified (file system metadata)</td>
</tr>
<tr>
<td><b>FS_DATE_ACC</b></td>
<td>Date document was last accessed (file system metadata)</td>
</tr>
<tr>
<td><b>FS_TIME_ACC</b></td>
<td>Time document was last accessed (file system metadata)</td>
</tr>
<tr>
<td><b>ENCRYPTED</b></td>
<td>Populated if document was encrypted</td>
</tr>
<tr>
<td><b>EXCEPTION</b></td>
<td>Populated if document was a processing exception</td>
</tr>
<tr>
<td><b>OCRED</b></td>
<td>Populated if document was OCR&#8217;ed</td>
</tr>
<tr>
<td><b>TEXT01</b></td>
<td>Document text/OCR field #1</td>
</tr>
<tr>
<td><b>TEXT02</b></td>
<td>Document text/OCR field #2</td>
</tr>
<tr>
<td><b>TEXT03</b></td>
<td>Document text/OCR field #3</td>
</tr>
<tr>
<td><b>TEXT04</b></td>
<td>Document text/OCR field #4</td>
</tr>
<tr>
<td><b>REDACTED_TXT</b></td>
<td>Populated with redacted text for redacted documents</td>
</tr>
</tbody>
</table>
<p style="text-align:center; font-size:10px; margin-top:-10px;">Table 1 &#8211; Example e-Discovery Database Field List v1</p>
<div class="toggle">
<h4 class="toggle_title"> View Change Log</h4>
<div class="toggle_content">
<h4>v1 &#8211; 2012/05/24</h4>
<p>Initial release</p></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/articles/fields-you-should-have-in-your-e-discovery-database/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
