e-Discovery Database Fields You Should Have in Your Database

Modern e-Discovery software can extract hundreds of metadata fields from documents. Extracted metadata is typically stored in a back-end database and a subset of it is exported and included in the e-Discovery production or review database. We often receive questions regarding which metadata fields should be included in an e-Discovery review database or which metadata fields should be requested during an electronic document production.

The answers to these questions depend on the requirements of each case and should ultimately be determined by the legal team. That said, we have prepared the following list of e-Discovery database fields as an example, with the hope that it will serve as a good starting point.

Please note the following:

The list below is for electronic documents only, and does not include fields for scanned paper documents. If you keep scanned paper documents and electronic documents together, you should add to the list paper specific fields such as BOXNO, BOXNAME, FOLDERNAME etc.
Depending on the legal review platform, e-Discovery databases can also contain several additional administrative and/or user populated fields
We recommend that the time fields be populated using 24-hour format with second precision (e.g. 23:14:05)

Field Name	Description
INPUTID	Internal document identifier
DOCID	Document ID for native-only databases
PARENTID	Parent document ID for native-only databases
BEGDOC	Beginning Bates number
ENDDOC	Ending Bates number
BEGATTACH	Beginning Bates number of the attachment family
ENDATTACH	Ending Bates number of the attachment family
PRODBEG	Beginning production Bates number
PRODEND	Ending production Bates number
PRODDATE	Date document was produced
PRODVOL	Production volume ID
BATES_RANGE	Bates range of the document
ATTACH_RANGE	Attachment range of the document family
ATTACH_IDS	IDs of documents attached to this document
ATTACH_CNT	Number of attachments indicated in document metadata
EXT_ATT_CNT	Number of attachments actually extracted
PGCOUNT	Page count
VOLUME_ID	Deliverable volume ID
CUSTODIAN	Custodian name
SOURCE	Source of processed ESI
DOC_TYPE	Document type
EMAIL_HEADER	Header information for e-mail message
EMAIL_BODY	The message body of e-mail message
EMAIL_FROM	E-mail author
EMAIL_RECIP	E-mail recipients
EMAIL_CC	E-mail carbon copy recipients
EMAIL_BCC	E-mail blind carbon copy recipients
EMAIL_SUBJ	E-mail subject
MAIL_FOLDER	Mail folder inside a mailbox
MAIL_STORE	Name of the mail store
ATTACHMENTS	File names of attached documents
IMPORTANCE	Importance of e-mail message
MESSAGE_ID	Message ID for e-mail
CONV_INDEX	E-mail conversation index (PR_CONVERSATION_INDEX)
UNREAD	Whether or not the e-mail was read
READ_RECEIPT	Whether or not read receipt was requested
INT_MSG_ID	Internet Message ID (PR_INTERNET_MESSAGE_ID)
DATE_RECD	Date received
TIME_RECD	Time received
DATE_SENT	Date sent
TIME_SENT	Time sent
MASTER_DATE	Parent date pushed down to child documents
MASTER_TIME	Parent time pushed down to child documents
START_DATE	Appointment start date
START_TIME	Appointment start time
END_DATE	Appointment end date
END_TIME	Appointment end time
TIME_ZONE	Time zone used during processing
FILE_NAME	File name
FILE_PATH	File path
FILE_TYPE	File type
FILE_EXT	File extension
FILE_SIZE	File size
NATIVE_LINK	Hyperlink to native file
MD5_HASH	MD5 Hash Value
SHA_HASH	SHA Hash Value
AUTHOR	Document author
SUBJECT	Document subject
TITLE	Document title
CATEGORIES	Categories metadata field extracted from document
COMMENTS	Comments metadata field extracted from document
KEYWORDS	Keywords metadata field extracted from document
COMPANY	Company metadata field extracted from document
REVISION	Revision # of document
SENSITIVITY	Sensitivity metadata field extracted from document
MODIFIED_BY	Name of person who last modified the document
SEARCH_HITS	Search keywords that made the document responsive (semi-colon delimited)
DOC_DATE	Best available date for document (combination of several date fields)
DOC_TIME	Best available time for document (combination of several time fields)
DATE_CRTD	Document creation date (internal document metadata)
TIME_CRTD	Document creation time (internal document metadata)
DATE_LST_PRN	Date document was last printed (internal document metadata)
TIME_LST_PRN	Time document was last printed (internal document metadata)
DATE_LST_MOD	Date document was last modified (internal document metadata)
TIME_LST_MOD	Time document was last modified (internal document metadata)
DATE_LST_SVD	Date document was last saved (internal document metadata)
TIME_LST_SVD	Time document was last saved (internal document metadata)
FS_DATE_CRTD	Document creation date (file system metadata)
FS_TIME_CRTD	Document creation time (file system metadata)
FS_DATE_MOD	Date document was last modified (file system metadata)
FS_TIME_MOD	Time document was last modified (file system metadata)
FS_DATE_ACC	Date document was last accessed (file system metadata)
FS_TIME_ACC	Time document was last accessed (file system metadata)
ENCRYPTED	Populated if document was encrypted
EXCEPTION	Populated if document was a processing exception
OCRED	Populated if document was OCR’ed
TEXT01	Document text/OCR field #1
TEXT02	Document text/OCR field #2
TEXT03	Document text/OCR field #3
TEXT04	Document text/OCR field #4
REDACTED_TXT	Populated with redacted text for redacted documents

Table 1 – Example e-Discovery Database Field List v1

View Change Log

v1.0 – 2012/05/24

Initial release

Tags:

View Change Log

v1.0 – 2012/05/24

Tags:

About Arman Gungor

What’s New

What We Do

Contact Us