Windows Numerical Sort: Why Numeric File Names are Sorted Differently

By August 15, 2012How-to

File names are stored as strings in almost every operating system and database management system. While this works well in most cases, it causes files with names containing numerals to be sorted counter intuitively. For example, contents of a folder containing 7 files with numeric suffixes would ordinarily look as follows:

Exhibit1.pdf
Exhibit10.pdf
Exhibit15.pdf
Exhibit2.pdf
Exhibit21.pdf
Exhibit3.pdf
Exhibit4.pdf

Figure 1 – Files Sorted Alphabetically

In scenarios where the order of the files is crucial (e.g. in the legal industry), end users typically pad the file names with zeros so that they are ordered correctly when sorted alphabetically. For example:

Exhibit001.pdf
Exhibit002.pdf
Exhibit003.pdf
Exhibit004.pdf
Exhibit010.pdf
Exhibit015.pdf
Exhibit021.pdf

Figure 2 – Files with Zero Padding

What is Windows Numerical Sort?

The Shell team at Microsoft at some point decided to improve things a bit and implemented a new way of comparing Unicode strings that contain numerals (see StrCmpLogicalW). The change took effect after Windows 2000, so operating systems such as Windows Server 2003, Windows XP, Windows Vista and Windows 7 sort numerals in folder and file names according to their numeric value. For example, our example folder would look as follows in Windows XP using Windows numerical sort:

Windows Numerical Sort

Figure 3 – Windows Numerical Sort in Windows XP

Issues Associated with Windows Numerical Sort

While this seems logical and may be helpful to most people, we believe that it brings new issues, especially in the legal industry.

1. Compatibility with e-Discovery and Computer Forensics Software:

Imagine a lawyer organizing exhibits to be processed to TIFF, endorsed and produced. Looking at the files in Windows Explorer, he would naturally assume that the files would be processed in the order as he sees them on his computer. However, computer forensics and e-discovery tools do not implement Microsoft’s sort algorithm, and treat the file and folder names as strings while sorting. Consequently, files would be processed and numbered in a different order than what the attorney had anticipated. Had Windows sorted the files without any special handling, the attorney or litigation support team would have noticed the incorrect sort order and compensated for it by correctly padding the file names or applying a custom sort order.

2. Consistency within the Operating System:

Even though Windows Explorer takes advantage of the StrCmpLogicalW API and sorts files and folders with names containing numerals in a logical manner, other areas of the operating system (such as the command line interface) still use the traditional sort method, causing inconsistencies in the way files are displayed in different parts of the same operating system. Please see Figure 4 below for a comparison of how Windows Explorer and the Command Line Interface (CLI) display the same set of files.

Windows v. Dos

Figure 4 – Files As Displayed by Windows vs. CLI on The Same Computer

3. Consistency among Operating Systems:

Microsoft’s proprietary sort algorithm does not match how files are displayed in other operating systems such as Linux and Mac OS. Furthermore, Microsoft has changed the StrCmpLogicalW API in different versions of its operating systems such as Windows XP, Windows Vista and Windows 7. Consequently, the way files are displayed in Windows Explorer varies slightly among Microsoft’s own operating systems.

How to Disable Windows Numerical Sort

Luckily, starting with Windows XP SP-1, Microsoft has made available a registry key that can suppress the use of StrCmpLogicalW API, turning off Windows numerical sort and reverting Windows Explorer to treating file names as strings. The registry key is as follows:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Currentversion\Policies\Explorer\NoStrCmpLogical

The value of the NoStrCmpLogical (DWORD) key should be set to 1 to prevent Windows XP and later versions from using Windows numerical sort. The Microsoft Support Website provides additional details about this issue. Please note that the above change requires a restart or log off to take effect. Remember to back-up your registry before making any changes.

Arman Gungor

About Arman Gungor

Arman Gungor is a certified computer forensic examiner (CCE) and an adept e-Discovery expert with over 21 years of computer and technology experience. Arman has been appointed by courts as a neutral computer forensics expert as well as a neutral e-Discovery consultant. His electrical engineering background gives him a deep understanding of how computer systems are designed and how they work.