<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Meridian Discovery &#124; e-Discovery, Computer Forensics, Hosting</title>
	<atom:link href="http://www.meridiandiscovery.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.meridiandiscovery.com</link>
	<description>e-Discovery, Computer Forensics, Hosting</description>
	<lastBuildDate>Tue, 31 Jan 2012 23:14:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>Meridian Discovery Launches New Website</title>
		<link>http://www.meridiandiscovery.com/announcements/meridian-discovery-launches-new-website/</link>
		<comments>http://www.meridiandiscovery.com/announcements/meridian-discovery-launches-new-website/#comments</comments>
		<pubDate>Sat, 21 Jan 2012 00:54:09 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Announcements]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/?p=904</guid>
		<description><![CDATA[We decided to refresh our website a few months ago and the new version is finally live. We hope that you will find our new design not only aesthetically appealing, but also easy to navigate. Please feel free to contact us with any comments.]]></description>
			<content:encoded><![CDATA[<p>We decided to refresh our website a few months ago and the new version is finally live. We hope that you will find our new design not only aesthetically appealing, but also easy to navigate. Please feel free to contact us with any comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/announcements/meridian-discovery-launches-new-website/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Meridian Discovery Increased Its Internet Bandwidth</title>
		<link>http://www.meridiandiscovery.com/announcements/meridian-discovery-increased-its-internet-bandwidth/</link>
		<comments>http://www.meridiandiscovery.com/announcements/meridian-discovery-increased-its-internet-bandwidth/#comments</comments>
		<pubDate>Wed, 26 Jan 2011 19:41:15 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Internet]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/base/?p=138</guid>
		<description><![CDATA[We are excited to announce that we have added a new 100Mbps dedicated fiber optic line to our Los Angeles data center. We can now provide lightning fast FTP access and hosting. Do you need to get 10 GB of data processed really quickly? How about sending it to us via FTP within 15 minutes ...]]></description>
			<content:encoded><![CDATA[<p>We are excited to announce that we have added a new 100Mbps dedicated fiber optic line to our Los Angeles data center. We can now provide lightning fast FTP access and hosting. Do you need to get 10 GB of data processed really quickly? How about sending it to us via FTP within 15 minutes instead of a physical pickup?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/announcements/meridian-discovery-increased-its-internet-bandwidth/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to take a quick look at a hard drive without modifying its contents?</title>
		<link>http://www.meridiandiscovery.com/articles/how-to-take-a-quick-look-at-a-hard-drive-without-modifying-its-contents/</link>
		<comments>http://www.meridiandiscovery.com/articles/how-to-take-a-quick-look-at-a-hard-drive-without-modifying-its-contents/#comments</comments>
		<pubDate>Thu, 20 Jan 2011 19:36:25 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Computer Forensics]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/base/?p=135</guid>
		<description><![CDATA[Hard drives are used throughout the e-Discovery process both as a potential source of electronically stored information (ESI) and as a medium to transport data. Even a simple e-Discovery project may involve one or more hard drives changing custody a few times. Let’s assume that you received an external hard drive from a forensic examiner ...]]></description>
			<content:encoded><![CDATA[<p>Hard drives are used throughout the e-Discovery process both as a potential source of electronically stored information (ESI) and as a medium to transport data. Even a simple e-Discovery project may involve one or more hard drives changing custody a few times.</p>
<p>Let’s assume that you received an external hard drive from a forensic examiner in connection with ongoing litigation. Naturally, the first thing you would want to do would be to plug it in, take a look at its contents and gather information such as the amount and type of data contained on the hard drive before you plan your next steps. You are well aware that you must not modify the contents of the hard drive as this would cause spoliation of electronic evidence. Did you know that the mere act of plugging a hard drive into your computer to view its contents is usually enough to modify its contents?</p>
<p>Consider the following sample scenarios:</p>
<ul>
<li>Browsing a folder that contains images using Windows Explorer in thumbnail view could cause Ms Windows to create a thumbnail cache file (Thumbs.db) in that folder, unless the &#8220;Do not cache thumbnails&#8221; option was chosen.</li>
<li>Similarly, last access times of files that were previewed using Windows Explorer would be updated.</li>
<li>Opening a file to view its contents (e.g. opening an Excel spreadsheet in Ms Excel) would change the file system last access time metadata.</li>
<li>Opening certain file types could cause additional temporary files to be created in a folder. For example, opening an Ms Word document called &#8220;Sample.doc&#8221; would cause an additional hidden temporary file called &#8220;~$Sample.doc&#8221; to be created in the same folder. This additional file would normally be deleted when Ms Word is closed, however it could be left behind if Ms Word terminates abnormally.<sup>[1]</sup></li>
<li>Opening certain file types (e.g. mounting a Personal Storage Table (PST) file in Ms Outlook), would change the binary contents, and consequently hash values of the files.</li>
<li>If the host computer was infected with computer viruses or malware, the attached hard drive could also get infected.</li>
<li>Some anti-spyware/anti-malware software could cause the last access times of files that they scan to be updated.</li>
</ul>
<p>There are numerous other ways that drive contents and/or file metadata can be altered inadvertently. The simplest and yet most powerful defense strategy is to write-protect the hard drive so that it can be read from but cannot be written to. There are two main methods of write blocking a hard drive:</p>
<ul>
<li><strong>Hardware Write Blockers: </strong>A hardware write blocker (also referred to as a forensic bridge) is a device that sits between the host computer and hard drive to be connected to the system. Most hardware write blockers support multiple interfaces and allow the end user to connect IDE and SATA internal hard drives or USB and FireWire external hard drives to a host system. <strong> </strong></li>
</ul>
<p><strong></strong></p>
<p>The device allows the host computer to read from the target drive but blocks all write requests. Two popular forensic bridge manufacturers are <a target="_blank" href="http://www.tableau.com/" title="Tableau">Tableau</a> and <a target="_blank" href="http://www.wiebetech.com" title="WiebeTech">WiebeTech</a>. These devices are fairly inexpensive and can be used very easily. Well worth the investment considering the stakes.<strong> </strong></p>
<p>The NIST has developed a test plan for evaluating hardware write blockers. The test plan and various test reports can be found on <a href="http://www.cftt.nist.gov/hardware_write_block.htm">http://www.cftt.nist.gov/hardware_write_block.htm</a></p>
<ul>
<li><strong>Software Write Blockers:</strong> There are also various software applications that will provide write blocking functionality. While software write-blocking sounds more practical and affordable, it comes with associated risks. Most software write blockers are not 100% forensically sound and have limitations.  For example, Ms Windows Service Pack 2 and higher allows USB ports to be write blocked using a <a href="http://www.armangungor.com/wp/2009/12/08/usb-write-protect/">registry hack</a>. While this simple method may work in most cases, it is effective only on USB devices that are connected after the change was made. In other words, a USB device that was connected before the registry hack will remain writeable until it is removed and reinserted.  <strong> </strong></li>
</ul>
<p>More advanced software write blockers that come with their own kernel mode device drivers are also available. The NIST provides evaluation criteria and results for such software on their website <a href="http://www.cftt.nist.gov/software_write_block.htm">http://www.cftt.nist.gov/software_write_block.htm</a>.</p>
<p><br clear="all" /><br />
<hr />
<p><span style="font-size: 8pt;"><sup>[1]</sup> Description of how Word creates temporary files &#8211; <a target="_blank" href="http://support.microsoft.com/kb/211632">http://support.microsoft.com/kb/211632</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/articles/how-to-take-a-quick-look-at-a-hard-drive-without-modifying-its-contents/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Meridian Discovery Releases New Near De-Duplication Software: Proksimiti</title>
		<link>http://www.meridiandiscovery.com/services/meridian-discovery-releases-new-near-de-duplication-software-proksimiti/</link>
		<comments>http://www.meridiandiscovery.com/services/meridian-discovery-releases-new-near-de-duplication-software-proksimiti/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 19:31:35 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[near de-duplication]]></category>
		<category><![CDATA[Proksimiti]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/base/?p=129</guid>
		<description><![CDATA[Meridian Discovery has announced the release of its new near-duplicate identification technology Proksimiti. Proksimiti is a scalable, distributed near duplicate identification system that can process large volumes of documents extremely quickly and accurately with the ability to visualize the results in an intuitive and appealing manner. Proksimiti provides significant time and cost savings during document ...]]></description>
			<content:encoded><![CDATA[<p>Meridian Discovery has announced the release of its new near-duplicate identification technology Proksimiti. Proksimiti is a scalable, distributed near duplicate identification system that can<br />
process large volumes of documents extremely quickly and accurately with the ability to visualize the results in an intuitive and appealing manner. Proksimiti provides significant time and cost savings during document review.</p>
<p>Meridian Discovery offers Proksimiti as a service, as a self-contained desktop application or a scalable, high throughput enterprise solution which can be deployed in-house and used unlimitedly at a fixed price. Proksimiti can also be integrated into other software solutions using the Proksimiti Software Development Kit (SDK).</p>
<p>“Proksimiti provides our clients with a cost-effective, defensible and easy-to-use system for accurately identifying near-duplicate documents regardless of text or paragraph formatting, document type or language,” said Arman Gungor, litigation support director at Meridian Discovery. “We have always encouraged our clients to utilize near-duplicate identification technology. It is exciting to be able to offer this technology at a much more affordable price.”</p>
<p>Reviewing near-duplicate documents in groups allows legal teams to focus only on the differences in each document within a document cluster. This eliminates the need to review redundant information over and over again and reduces review time dramatically. Assigning an entire cluster of near-duplicate documents to the same reviewer allows more consistent document review. Proksimiti also helps legal teams manage their case more efficiently and reduces the risk of overlooking critical information.</p>
<p>Discover Proksimiti at the Meridian Discovery website.</p>
<p><a href="/files/Meridian_Discovery-Near_Deduplication.pdf" title="Proksimiti Near de-Duplication Software"><img height="20" width="20" src="/base/wp-content/uploads/2011/09/pdf_icon_xsm.png" alt="pdf_icon_xsm" style="vertical-align: middle;" /> Download PDF version of our near de-duplication brochure</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/services/meridian-discovery-releases-new-near-de-duplication-software-proksimiti/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Meridian Discovery Complies with EDRM XML</title>
		<link>http://www.meridiandiscovery.com/services/meridian-discovery-complies-with-edrm-xml/</link>
		<comments>http://www.meridiandiscovery.com/services/meridian-discovery-complies-with-edrm-xml/#comments</comments>
		<pubDate>Fri, 05 Feb 2010 19:27:36 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[EDRM]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/base/?p=127</guid>
		<description><![CDATA[Meridian Discovery&#8217;s electronic discovery and hosting systems have been certified as EDRM XML import and export compliant by the EDRM XML project team. This means that we can officially import EDRM XML load files into our system and provide load files in EDRM XML format in our deliverables. The EDRM XML project&#8217;s mission is to ...]]></description>
			<content:encoded><![CDATA[<p>Meridian Discovery&#8217;s electronic discovery and hosting systems have been certified as <a target="_blank" href="http://edrm.net/activities/projects/xml" title="EDRM XML Project">EDRM XML</a> import and export compliant by the EDRM XML project team. This means that we can officially import EDRM XML load files into our system and provide load files in EDRM XML format in our deliverables.</p>
<p>The EDRM XML project&#8217;s mission is to have EDRM XML be the primary format for electronic data exchange between parties and systems, reducing the time and risk involved with data exchange.  You can visit the <a target="_blank" href="http://www.edrm.net/" title="EDRM Website">EDRM website</a> to get more information about the project.</p>
<p>As an e-Discovery service provider and software developer, we frequently move electronically stored information (ESI) between software platforms, projects and organizations. Proprietary file formats cost valuable time and resources to work with and increase errors and risk associated with electronic data transfers. By participating in the EDRM XML project, we are hoping to expand our integration options with different platforms as well as get involved in developing future versions of this data exchange format.</p>
<p>A list of EDRM XML compliant software and organizations can be found on the <a target="_blank" href="http://edrm.net/resources/standards/edrm-xml-schema/edrm-xml-compliance" title="EDRM XML Compliant Organizations">project website</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/services/meridian-discovery-complies-with-edrm-xml/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unicode and Character Encodings</title>
		<link>http://www.meridiandiscovery.com/articles/unicode-and-character-encodings/</link>
		<comments>http://www.meridiandiscovery.com/articles/unicode-and-character-encodings/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 19:00:52 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[e-Discovery]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/base/?p=98</guid>
		<description><![CDATA[Introduction One area of litigation support that continues to grow rapidly is electronic discovery. Identifying, collecting, processing and searching in foreign languages have been major shortcomings in most commercial off-the-shelf e-discovery platforms and review tools until very recently. While several platforms have finally made the switch to support multiple languages, a number of tools in ...]]></description>
			<content:encoded><![CDATA[<h3>Introduction</h3>
<p>One area of litigation support that continues to grow rapidly is electronic discovery. Identifying, collecting, processing and searching in foreign languages have been major shortcomings in most commercial off-the-shelf e-discovery platforms and review tools until very recently. While several platforms have finally made the switch to support multiple languages, a number of tools in the e-discovery technician’s arsenal are still either completely ignorant of character encoding issues or provide only limited support.</p>
<p>Having such weak links in the e-discovery workflow, it is not uncommon to run into documents in an electronic production filled with dreaded question marks or boxes instead of foreign language symbols. Consequently, having a deep understanding of Unicode and character encodings remains to be critical in order to handle electronic documents accurately.</p>
<h3>ASCII</h3>
<p>Back in the day when computers were first invented, the only characters used were the unaccented English letters. They were coded using the American Standard Code for Information Interchange (ASCII), which was able to represent each printable character using a code between 32 and 127. ASCII is a character encoding scheme based on the ordering of the English alphabet. It includes definitions for 33 non-printing control characters that are used to control devices such as printers, as well as 94 printable characters and the space character. For example, “07” makes your computer beep and “10” represents the line feed function. ASCII was the most commonly used character encoding on the Internet until 2008. It has since been surpassed by UTF-8. A simple way of reproducing a character from its ASCII code is to open a text editor and type the ASCII code using the numeric keypad while holding down the ALT key and then releasing it. Certain control characters such as a line feed (ALT+010) or backspace (ALT+008) can also be<br />
produced using this method.</p>
<pre id="line75"><center>
0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F
0  NUL SOH STX ETX EOT ENQ ACK BEL BS  HT  LF  VT  FF  CR  SO  SI
1  DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM  SUB ESC FS  GS  RS  US
2   SP  !   "   #   $   %   &amp;   '   (   )   *   +   ,   -   .   /
3   0   1   2   3   4   5   6   7   8   9   :   ;   &lt;   =   &gt;   ?
4   @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
5   P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _
6   `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o
7   p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~ DEL
<center></pre>
<p align="center"><span style="font-size: 8pt;"><strong>Figure 1</strong> Hexadecimal ASCII Table</span></p>
<h3>Extensions to ASCII, ISO-8859-1 and Windows-1252</h3>
<p>Since the number of symbols used in common natural languages exceeded the limited range of ASCII code, many extensions were proposed. As ASCII was a seven-bit code and most computers manipulated data in eight-bit bytes, many extensions used the additional 128 codes available by using all eight bits of each byte. While this helped include many languages otherwise not easily presentable in ASCII, it still was not sufficient to cover all spoken languages. Consequently, even these eight-bit extensions had to have local variants.</p>
<p>ISO/IEC 8859 was one of these extensions. The ISO-8859 standard consists of 15 numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2 etc, each of which contains characters for a different Latin alphabet. For example, Part 1 is Latin-1 Western European and Part 9 is Latin-5 Turkish. Windows-1252, also known as “ANSI Character Encoding” was a similar 8-bit character encoding of the Latin alphabet used by default in Microsoft Windows. It is a superset of ISO 8859-1, but differs from it by using displayable characters rather than control characters in the 0&#215;80 to 0x9F range.</p>
<p>One of the limitations of ANSI was the lack of ability to represent multiple languages on the same computer. For example, displaying Hebrew and Turkish on the same computer at the same time was not possible since different code pages with different interpretations of high numbers were required.</p>
<p>Another critical limitation was the number of characters that could be supported. While ANSI was able to encode up to 256 characters, some Asian languages contained thousands of symbols.</p>
<h3>Unicode</h3>
<p>Unicode is a computing standard that incorporates all reasonable writing systems in the world into a single character set. Unicode’s development is coordinated by a non-profit organization called the Unicode Consortium. Unicode is a standard; it is not a character encoding. It can be implemented by different character encodings such as UTF-8, UTF-16 or UTF-32. It is widely assumed that Unicode is simply a 16-bit (double byte) code with a maximum capacity of 65,536 characters. This is simply incorrect. The latest version of Unicode, as of this writing, consists of more than 107,000 characters covering 90 scripts.</p>
<p>Unicode represents each letter with a code point. Code points are usually written in the form of U+262E where the numeric part is hexadecimal. For each character, a fairly typical sample rendition as well as information on how to display it (such as line breaking, hyphenation and sorting) are included.</p>
<p>A simple way of typing Unicode characters on your computer is to use Microsoft Global Input Method Editor (IME). For example, to type せんせい in Japanese using English letters, you can activate IME by pressing + , switch the mode to Japanese and type “sennsei”. Another simple method for converting Unicode code points to characters in Ms Word is to write the code point and to press ALT+X when the cursor is at the end of the code point. For example, U+262E that was mentioned above is translated to ☮ by Ms Word.</p>
<p>Unicode code points for certain characters can be found using the Character Map utility in Ms Windows (charmap.exe). Selecting a Unicode font such as “Arial Unicode MS” , choosing “Unicode” as the character set and using the “Group by” drop down menu allows users to locate groups of symbols conveniently.</p>
<p>For example, the Pilcrow sign is listed under “Unicode Subrange\General Punctuation” as U+00B6. A comprehensive list can be obtained from the Unicode Consortium website at www.unicode.org. The full Unicode code space supports over a million code points. The majority of characters used in today’s modern languages are allocated within the first 65,536 code points, which is called the Basic Multilingual Plane (BMP). The first 128 code points within the BMP correspond to the Basic Latin ASCII characters.</p>
<p style="text-align: center;"><img src="/base/wp-content/uploads/2011/09/UnicodeAllocation_sm.png" alt="UnicodeAllocation_sm" width="559" height="349" /><br />
<span style="font-size: 8pt;"><strong>Figure 2</strong> Unicode Allocation</span></p>
<h3>Character Encodings</h3>
<p>Unicode Transformation Format (UTF) and Universal Character Set (UCS) encodings are the two mapping methods used in Unicode implementation. These encodings map the Unicode code points to bytes so that they can be stored in memory. Some of the common UTF encodings are UTF-1, UTF-7, UTF-8, UTF-EBCDIC, UTF-16 and UTF-32.</p>
<p>UTF-8 is a Unicode encoding that uses 8-bit bytes to store code points in memory (similarly, UTF-7 contains 7 bits in one code value). In UTF-8, every code point from 0-127 is stored in a single byte, making the encoding very compact for standard Latin characters. Code points 128 and above are stored using 2 or more, up to 6, bytes. In other words, encoded English text looks the same in UTF-8 as it would in ASCII. For example, the string “TEXT” is represented as U+0054 U+0045 U+0058 U+0054 using Unicode code points. The UTF-8 encoded version of the same string would be 54 45 58 54, which are the ASCII representations of these characters (see Figure 1).</p>
<p>UTF-8 has quickly become the de facto encoding standard for interchange of Unicode text over the internet.</p>
<p>UTF-16 is another variable-width encoding that uses 16 bits in each code value. It can be considered a compromise as it uses 2 bytes most of the time and expands to 4 bytes per character as necessary in order to represent characters outside of the Basic Multilingual Plane (BMP). Unfortunately, even though Unicode is not an encoding, it is incorrectly used interchangeably with UTF-16 encoding. For example, members of the System.Text.Encoding class in .NET are as follows:</p>
<p>System.Text.ASCIIEncoding<br />
System.Text.UnicodeEncoding<br />
System.Text.UTF32Encoding<br />
System.Text.UTF7Encoding<br />
System.Text.UTF8Encoding</p>
<p>What is implied by System.Text.UnicodeEncoding is actually what should have been System.Text. UTF16Encoding. Similarly, Ms Notepad allows users to specify the encoding while saving a text document. Provided options are ANSI, Unicode and UTF-8. What’s meant by Unicode is actually UTF-16 encoding.</p>
<h3>Unicode Byte Order Mark (BOM) and Endianness</h3>
<p>In computer architecture, endianness is the byte or bit order used in representing data while storing it in memory or transferring it over a network. Endianness is generally dictated by hardware. For example, architectures such as x86, Z80 and VAX use the little-endian convention, which is least significant byte (LSB) first, while some others such as PowerPC or SPARC use the opposite order, big-endian (most significant byte first). This also applies to the way Unicode data is encoded. For example, UTF-16 encoding of the string “TEXT” mentioned above would be 0054 0045 0058 0054 in little-endian format. However, it is possible to reverse the byte order and represent the same Unicode data as 5400 4500 5800 5400 using UTF-16 big-endian.</p>
<p>Since encoded Unicode text can be ordered two different ways, we are forced to include a Byte Order Mark (BOM) at the beginning of every encoded Unicode string. The order of the BOM (FEFF or FF FE for UTF-16) indicates whether little-endian or big-endian order is used. Since UTF-8 is byte oriented, it does not require using a BOM. However, an initial BOM might be useful in identifying the data stream as UTF-8. Typical BOM values for different encodings are as follows:</p>
<p><center></p>
<div style="width:400px" class="table_style">
<table>
<tbody align="center">
<thead>
<tr bgcolor="#0094ff">
<th><strong>Bytes</strong></span></th>
<th><strong>Encoding Form</strong></span></th>
</tr>
</thead>
<tfoot>
<tr>
<td colspan="2"><em><strong>Figure 3</strong> Byte Order Marks for Some Unicode Encodings</em></td>
</tr>
</tfoot>
<tr>
<td>00 00 FE FF</td>
<td>UTF-32, big-endian</td>
</tr>
<tr>
<td>FF FE 00 00</td>
<td>UTF-32, little-endian</td>
</tr>
<tr>
<td>FE FF</td>
<td>UTF-16, big-endian</td>
</tr>
<tr>
<td>FF FE</td>
<td>UTF-16, little-endian</td>
</tr>
<tr>
<td>EF BB BF</td>
<td>UTF-8</td>
</tr>
</tbody>
</table>
</div>
<p></center><br />
The UTF-8 BOM is frequently displayed as “ï»¿” in ISO-8859-1 by tools that are not prepared to handle UTF-8.</p>
<p>Strings are meaningful when accompanied by their encoding information. When working with a string in memory, it is important to know how it is encoded in order to correctly interpret and display it. Most problems regarding documents that look like gibberish are caused by programmers or technicians who fail to preserve the encoding information a string uses. When transforming byte strings to Unicode, we are in effect decoding our data. If we fail to provide the correct character encoding to decode from, we will inevitably end up with garbled data.</p>
<h3>Conclusion</h3>
<p>Understanding the key concepts underlying Unicode and character encodings is a fundamental step for the litigation support specialist. Almost every law firm or vendor has to deal with data originating from foreign countries. If your processes are unable to collect, store and interpret Unicode data, there is a good chance that this will become a major problem very quickly. Spending some time on the Unicode Consortium Website to get familiar with the code charts, getting a copy of the printed Unicode Standard and attending one of the semiannual Unicode conferences are good starting points.<br />
<br/><br />
<a title="Unicode and Character Encodings" href="/files/Meridian_Discovery-Unicode_and_Character_Encodings.pdf"><img style="vertical-align: middle;" src="/base/wp-content/uploads/2011/09/pdf_icon_xsm.png" alt="pdf_icon_xsm" width="20" height="20" /> Download PDF version of this white paper</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/articles/unicode-and-character-encodings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Audio Discovery Service</title>
		<link>http://www.meridiandiscovery.com/services/audio-discovery-service/</link>
		<comments>http://www.meridiandiscovery.com/services/audio-discovery-service/#comments</comments>
		<pubDate>Sat, 25 Jul 2009 18:57:08 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Audio Discovery]]></category>

		<guid isPermaLink="false">http://www.meridiandiscovery.com/base/?p=96</guid>
		<description><![CDATA[Meridian Discovery is now offering audio discovery service as part of our e-Discovery workflow. Most electronic document collections contain audio and video files. These files are typically categorized as processing exceptions since ordinary e-Discovery software cannot extract text from them or convert them to images. Having reviewers locate, open and listen to each audio file ...]]></description>
			<content:encoded><![CDATA[<p>Meridian Discovery is now offering audio discovery service as part of our e-Discovery workflow. Most electronic document collections contain audio and video files. These files are typically categorized as processing exceptions since ordinary e-Discovery software cannot extract text from them or convert them to images. Having reviewers locate, open and listen to each audio file in full is usually not a viable option due to the amount of time and cost involved in such a process.</p>
<p>We have developed a system that allows us to include audio and video files in our e-Discovery process. Our e-Discovery system is now able to identify such files by their binary headers and transcribe them on the fly as they are processed. Resulting text is imported into our back-end database as with any other file type so that audio and video files can be included during a fulltext search.</p>
<p>Our clients have experienced significant time and cost savings with our new audio discovery service. Imagine reviewing a corporate mailbox with thousands of voicemail attachments. Listening to each voicemail manually could turn into a nightmare. On the other hand, we provide a fully searchable database that contains text transcribed from the voicemail messages as well as optional placeholder images with more information (i.e. metadata) about each file. The reviewer sees each voicemail attachment as a record following its parent e-mail message, he is able to search the text and listen to the audio if he wishes.</p>
<p><a href="/files/Meridian_Discovery-Audio_Discovery.pdf" title="Audio Discovery Service"><img height="20" width="20" src="/base/wp-content/uploads/2011/09/pdf_icon_xsm.png" alt="pdf_icon_xsm" style="vertical-align: middle;" /> Download PDF version of our audio discovery brochure</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.meridiandiscovery.com/services/audio-discovery-service/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

