e-Discovery Database Fields You Should Have in Your Database

By May 24, 2012Articles

Modern e-Discovery software can extract hundreds of metadata fields from documents. Extracted metadata is typically stored in a back-end database and a subset of it is exported and included in the e-Discovery production or review database. We often receive questions regarding which metadata fields should be included in an e-Discovery review database or which metadata fields should be requested during an electronic document production.

The answers to these questions depend on the requirements of each case and should ultimately be determined by the legal team. That said, we have prepared the following list of e-Discovery database fields as an example, with the hope that it will serve as a good starting point.

Please note the following:

  • The list below is for electronic documents only, and does not include fields for scanned paper documents. If you keep scanned paper documents and electronic documents together, you should add to the list paper specific fields such as BOXNO, BOXNAME, FOLDERNAME etc.
  • Depending on the legal review platform, e-Discovery databases can also contain several additional administrative and/or user populated fields
  • We recommend that the time fields be populated using 24-hour format with second precision (e.g. 23:14:05)
Field Name Description
INPUTID Internal document identifier
DOCID Document ID for native-only databases
PARENTID Parent document ID for native-only databases
BEGDOC Beginning Bates number
ENDDOC Ending Bates number
BEGATTACH Beginning Bates number of the attachment family
ENDATTACH Ending Bates number of the attachment family
PRODBEG Beginning production Bates number
PRODEND Ending production Bates number
PRODDATE Date document was produced
PRODVOL Production volume ID
BATES_RANGE Bates range of the document
ATTACH_RANGE Attachment range of the document family
ATTACH_IDS IDs of documents attached to this document
ATTACH_CNT Number of attachments indicated in document metadata
EXT_ATT_CNT Number of attachments actually extracted
PGCOUNT Page count
VOLUME_ID Deliverable volume ID
CUSTODIAN Custodian name
SOURCE Source of processed ESI
DOC_TYPE Document type
EMAIL_HEADER Header information for e-mail message
EMAIL_BODY The message body of e-mail message
EMAIL_FROM E-mail author
EMAIL_RECIP E-mail recipients
EMAIL_CC E-mail carbon copy recipients
EMAIL_BCC E-mail blind carbon copy recipients
EMAIL_SUBJ E-mail subject
MAIL_FOLDER Mail folder inside a mailbox
MAIL_STORE Name of the mail store
ATTACHMENTS File names of attached documents
IMPORTANCE Importance of e-mail message
MESSAGE_ID Message ID for e-mail
CONV_INDEX E-mail conversation index (PR_CONVERSATION_INDEX)
UNREAD Whether or not the e-mail was read
READ_RECEIPT Whether or not read receipt was requested
INT_MSG_ID Internet Message ID (PR_INTERNET_MESSAGE_ID)
DATE_RECD Date received
TIME_RECD Time received
DATE_SENT Date sent
TIME_SENT Time sent
MASTER_DATE Parent date pushed down to child documents
MASTER_TIME Parent time pushed down to child documents
START_DATE Appointment start date
START_TIME Appointment start time
END_DATE Appointment end date
END_TIME Appointment end time
TIME_ZONE Time zone used during processing
FILE_NAME File name
FILE_PATH File path
FILE_TYPE File type
FILE_EXT File extension
FILE_SIZE File size
NATIVE_LINK Hyperlink to native file
MD5_HASH MD5 Hash Value
SHA_HASH SHA Hash Value
AUTHOR Document author
SUBJECT Document subject
TITLE Document title
CATEGORIES Categories metadata field extracted from document
COMMENTS Comments metadata field extracted from document
KEYWORDS Keywords metadata field extracted from document
COMPANY Company metadata field extracted from document
REVISION Revision # of document
SENSITIVITY Sensitivity metadata field extracted from document
MODIFIED_BY Name of person who last modified the document
SEARCH_HITS Search keywords that made the document responsive (semi-colon delimited)
DOC_DATE Best available date for document (combination of several date fields)
DOC_TIME Best available time for document (combination of several time fields)
DATE_CRTD Document creation date (internal document metadata)
TIME_CRTD Document creation time (internal document metadata)
DATE_LST_PRN Date document was last printed (internal document metadata)
TIME_LST_PRN Time document was last printed (internal document metadata)
DATE_LST_MOD Date document was last modified (internal document metadata)
TIME_LST_MOD Time document was last modified (internal document metadata)
DATE_LST_SVD Date document was last saved (internal document metadata)
TIME_LST_SVD Time document was last saved (internal document metadata)
FS_DATE_CRTD Document creation date (file system metadata)
FS_TIME_CRTD Document creation time (file system metadata)
FS_DATE_MOD Date document was last modified (file system metadata)
FS_TIME_MOD Time document was last modified (file system metadata)
FS_DATE_ACC Date document was last accessed (file system metadata)
FS_TIME_ACC Time document was last accessed (file system metadata)
ENCRYPTED Populated if document was encrypted
EXCEPTION Populated if document was a processing exception
OCRED Populated if document was OCR’ed
TEXT01 Document text/OCR field #1
TEXT02 Document text/OCR field #2
TEXT03 Document text/OCR field #3
TEXT04 Document text/OCR field #4
REDACTED_TXT Populated with redacted text for redacted documents

Table 1 – Example e-Discovery Database Field List v1

View Change Log

v1.0 – 2012/05/24

Initial release

Arman Gungor

About Arman Gungor

Arman Gungor is a certified computer forensic examiner (CCE) and an adept e-Discovery expert with over 21 years of computer and technology experience. Arman has been appointed by courts as a neutral computer forensics expert as well as a neutral e-Discovery consultant. His electrical engineering background gives him a deep understanding of how computer systems are designed and how they work.