Describe Your Data

Funding agencies encourage the use of standards to document and describe your research data. 

Tufts Data Management Team can help you choose a general or discipline-specific standard, metadata schema, ontology, or vocabulary best suited for your project.

Discipline-specific research data repositories often employ metadata standards and provide an array of fields you can use to describe your data.

Please see Tufts Dataverse for more information about metadata in that platform.

Readme Files

Whether you use an existing metadata standards or create your own to describe your research data, creating and maintaining a Readme file helps ensure that your data can be correctly interpreted.

Cornell University maintains a guide and downloadable template for Readme files. 

Discipline-Specific Metadata Standards 

  • Dublin Core (DC): a descriptive metadata scheme used across disciplines
  • Data Documentation Initiative (DDI): a standard for describing survey and observational methods data in the social, behavioral, economic, and health sciences
  • Darwin Core:  a standard for documenting data about biological diversity and taxa
  • Gene Ontology: describes knowledge of the biological domain with respect to molecular function, cellular component, and biological process
  • mz Markup Language (mzML): an XML format for encoding mass spectrometer data
  • Text Encoding Initiative (TEI): an XML standard for the representation of texts in digital form
  • Getty Thesaurus of Geographic Names (TGN): a controlled vocabulary of places relevant to art, architecture, and related disciplines
  • FAIRsharing: a curated resource of thousands of metadata standards and schema across academic disciplines

Glossary of Terms

  • Metadata: data about data; information describing when and how the data was collected, who created the data, and what tools were used
  • Metadata schema: a standard defining data elements and rules for how to use them
  • Controlled vocabulary: a standardized list of terms used to consistently describe data
  • Ontology: a formal representation of a body of knowledge within a given domain
  • Markup language: a text encoding system used to structure components of a document
  • Common Data Element (CDE): a standardized, precisely defined question, paired with a set of allowable responses, used systematically across different sites, studies, or clinical trials to ensure consistent data collection. (source: https://cde.nlm.nih.gov/)