Modern e-Discovery software can extract hundreds of metadata fields from documents. Extracted metadata is typically stored in a back-end database and a subset of it is exported and included in the e-Discovery production or review database. We often receive questions regarding which metadata fields should be included in an e-Discovery review database or which metadata fields should be requested during an electronic document production.
The answers to these questions depend on the requirements of each case and should ultimately be determined by the legal team. That said, we have prepared the following list of e-Discovery database fields as an example, with the hope that it will serve as a good starting point.
Please note the following:
- The list below is for electronic documents only, and does not include fields for scanned paper documents. If you keep scanned paper documents and electronic documents together, you should add to the list paper specific fields such as BOXNO, BOXNAME, FOLDERNAME etc.
- Depending on the legal review platform, e-Discovery databases can also contain several additional administrative and/or user populated fields
- We recommend that the time fields be populated using 24-hour format with second precision (e.g. 23:14:05)
Field Name | Description |
---|---|
INPUTID | Internal document identifier |
DOCID | Document ID for native-only databases |
PARENTID | Parent document ID for native-only databases |
BEGDOC | Beginning Bates number |
ENDDOC | Ending Bates number |
BEGATTACH | Beginning Bates number of the attachment family |
ENDATTACH | Ending Bates number of the attachment family |
PRODBEG | Beginning production Bates number |
PRODEND | Ending production Bates number |
PRODDATE | Date document was produced |
PRODVOL | Production volume ID |
BATES_RANGE | Bates range of the document |
ATTACH_RANGE | Attachment range of the document family |
ATTACH_IDS | IDs of documents attached to this document |
ATTACH_CNT | Number of attachments indicated in document metadata |
EXT_ATT_CNT | Number of attachments actually extracted |
PGCOUNT | Page count |
VOLUME_ID | Deliverable volume ID |
CUSTODIAN | Custodian name |
SOURCE | Source of processed ESI |
DOC_TYPE | Document type |
EMAIL_HEADER | Header information for e-mail message |
EMAIL_BODY | The message body of e-mail message |
EMAIL_FROM | E-mail author |
EMAIL_RECIP | E-mail recipients |
EMAIL_CC | E-mail carbon copy recipients |
EMAIL_BCC | E-mail blind carbon copy recipients |
EMAIL_SUBJ | E-mail subject |
MAIL_FOLDER | Mail folder inside a mailbox |
MAIL_STORE | Name of the mail store |
ATTACHMENTS | File names of attached documents |
IMPORTANCE | Importance of e-mail message |
MESSAGE_ID | Message ID for e-mail |
CONV_INDEX | E-mail conversation index (PR_CONVERSATION_INDEX) |
UNREAD | Whether or not the e-mail was read |
READ_RECEIPT | Whether or not read receipt was requested |
INT_MSG_ID | Internet Message ID (PR_INTERNET_MESSAGE_ID) |
DATE_RECD | Date received |
TIME_RECD | Time received |
DATE_SENT | Date sent |
TIME_SENT | Time sent |
MASTER_DATE | Parent date pushed down to child documents |
MASTER_TIME | Parent time pushed down to child documents |
START_DATE | Appointment start date |
START_TIME | Appointment start time |
END_DATE | Appointment end date |
END_TIME | Appointment end time |
TIME_ZONE | Time zone used during processing |
FILE_NAME | File name |
FILE_PATH | File path |
FILE_TYPE | File type |
FILE_EXT | File extension |
FILE_SIZE | File size |
NATIVE_LINK | Hyperlink to native file |
MD5_HASH | MD5 Hash Value |
SHA_HASH | SHA Hash Value |
AUTHOR | Document author |
SUBJECT | Document subject |
TITLE | Document title |
CATEGORIES | Categories metadata field extracted from document |
COMMENTS | Comments metadata field extracted from document |
KEYWORDS | Keywords metadata field extracted from document |
COMPANY | Company metadata field extracted from document |
REVISION | Revision # of document |
SENSITIVITY | Sensitivity metadata field extracted from document |
MODIFIED_BY | Name of person who last modified the document |
SEARCH_HITS | Search keywords that made the document responsive (semi-colon delimited) |
DOC_DATE | Best available date for document (combination of several date fields) |
DOC_TIME | Best available time for document (combination of several time fields) |
DATE_CRTD | Document creation date (internal document metadata) |
TIME_CRTD | Document creation time (internal document metadata) |
DATE_LST_PRN | Date document was last printed (internal document metadata) |
TIME_LST_PRN | Time document was last printed (internal document metadata) |
DATE_LST_MOD | Date document was last modified (internal document metadata) |
TIME_LST_MOD | Time document was last modified (internal document metadata) |
DATE_LST_SVD | Date document was last saved (internal document metadata) |
TIME_LST_SVD | Time document was last saved (internal document metadata) |
FS_DATE_CRTD | Document creation date (file system metadata) |
FS_TIME_CRTD | Document creation time (file system metadata) |
FS_DATE_MOD | Date document was last modified (file system metadata) |
FS_TIME_MOD | Time document was last modified (file system metadata) |
FS_DATE_ACC | Date document was last accessed (file system metadata) |
FS_TIME_ACC | Time document was last accessed (file system metadata) |
ENCRYPTED | Populated if document was encrypted |
EXCEPTION | Populated if document was a processing exception |
OCRED | Populated if document was OCR’ed |
TEXT01 | Document text/OCR field #1 |
TEXT02 | Document text/OCR field #2 |
TEXT03 | Document text/OCR field #3 |
TEXT04 | Document text/OCR field #4 |
REDACTED_TXT | Populated with redacted text for redacted documents |
Table 1 – Example e-Discovery Database Field List v1
View Change Log
v1.0 – 2012/05/24
Initial release