Microsoft Office documents typically contain a great amount of metadata, some of which can be instrumental in computer forensics. While e-Discovery and computer forensics software can handle extracting and displaying most of the metadata, I found that a crucial piece of information is usually not extracted: Microsoft Word last 10 authors — also known as Word save history.
What is Word Last 10 Authors?
Certain versions of Microsoft Word such as Word 8.0 (Word 97) through Word 10.0 (Word 2002) store the names of the last 10 people who edited the document as well as the file locations. This information is not displayed to the end user through the Microsoft Word user interface, and according to the Microsoft Support website, this is an automatic feature that cannot be disabled (see WD97: How to Minimize Metadata in Microsoft Word Documents [KB 223790]). The following is an example of what may be found in the Word last 10 authors metadata (labels and numbers added for clarity, test data used for demonstrative purposes):
2 – Author: johnd Path: D:\DOCUME~1\mdd.LAB\LOCALS~1\Temp\AutoRecovery save of Sample_v2.asd
3 – Author: johnd Path: D:\Documents and Settings\mdd.LAB\Desktop\Sample_v2.doc
4 – Author: johnd Path: D:\Documents and Settings\mdd.LAB\Desktop\Sample_v2.doc
5 – Author: jdoe Path: C:\WINDOWS\DESKTOP\Sample_v2.doc
6 – Author: jdoe Path: C:\WINDOWS\DESKTOP\Sample_v3.doc
7 – Author: jdoe Path: C:\WINDOWS\DESKTOP\Sample_v3.doc
8 – Author: jdoe Path: C:\WINDOWS\DESKTOP\Sample_v3.doc
9 – Author: jdoe Path: C:\WINDOWS\DESKTOP\Sample_v3.doc
10 – Author: jwhite Path: C:\WINDOWS\DESKTOP\Sample_v3.doc
As you can imagine, sending out a document with such a revision log can sometimes be problematic (see Richard M. Smith’s posts on the Blair Document and Microsoft’s 1999 Annual Report—original links appear to be dead at this point). On the other hand, such information can be a gold mine for a computer forensics expert.
Extracting Word Last 10 Authors Metadata
Word Documents containing Word Last 10 Authors Metadata are Object Linking and Embedding (OLE) compound files as specified by the Microsoft Compound File Binary File Format (CFB). The following two Microsoft documents outline the Compound File Binary File Format as well as the Word Binary File Format.
Briefly, extracting the Word last 10 authors metadata requires locating the File Information Block (FIB) and reading the fcSttbSavedBy, lcbSttbSavedBy and fWhichTblStm values. The fcSttbSavedBy and lcbSttbSavedBy values specify the offset in the Table Stream where the SttbSavedBy structure — containing the save history of the file — is located and the size of the SttbSavedBy structure, while the fWhichTblStm bit indicates the Table Stream the FIB is referring to. Depending on the value of the fWhichTblStm bit, the 0Table or 1Table Stream is read and the SttbSavedBy structure is extracted using the fcSttbSavedBy and lcbSttbSavedBy values.
The SttbSavedBy structure is a string table (STTB structure) which contains string pairs indicating the name of the author who saved the document and the path and name of the saved file. Parsing the SttbSavedBy structure reveals the save history of the document, also known as Word last 10 authors metadata.
In order to do the parsing, I wrote a Python script which utilizes the olefile Python package to read the Table Stream. I also added an optional ‘-m’ switch which outputs olefile’s metadata dump. Feel free to give it a try and let me know your thoughts.
If you prefer Perl, check out Harlan Carvey’s related post on MetaData and eDiscovery.
Some of the file types we regularly deal with can be very complex, and may contain hidden metadata. It is important for computer forensics experts to understand the underlying structure of the electronic evidence they are working with so that they can validate the results of the tools they are using as well as go beyond what the tools can accomplish.