Preserve Data

Storing vs. Archiving

There is a difference between storing and sharing your data in a repository, and archiving and preserving it for the long term. Archiving means that your data is actively cared for, to ensure it is maintained and usable into the future. Some data repositories also act as archives, so make sure you learn about whether your repository of choice offers these services. An archive will:

  • Save multiple copies of your files in multiple locations
  • Perform regular fixity checks to ensure your data does not degrade or become corrupted over time
  • Migrate your data to the most current standard file formats
  • Collect and preserve your documentation

What can you do to ensure your data is usable over time?

  • Make sure data collection, analysis, and variables are well-documented
  • Use an open file format that is unencrypted and uncompressed to ensure interoperability between software applications and sustainable access for the long term
  • Consider using a format that supports embedded metadata

Preservation File Formats

Type of Content Preferred File Formats
Audio WAV, BWF (Broadcast WAVE), FLAC
Databases SQLite (.db, .db3, .sqlite, .sqlite3), TSV, CSV
Documents and text PDF/A, PDF, ODF (Open Document Format), DOCX (Microsoft Word), RTF, TXT
Images
  • Raster: TIFF, JPEG2000, PNG, JPEG
  • Vector: SVG
Maps, GIS, and Geospatial Data SHP (Shapefile), GDB (ESRI Arc Geodatabase), GPKG (OGC GeoPackage), GeoTIFF, GML (Geography Markup Language) 
Software Uncompiled source code and compiled runtime (executable) file, with any release notes, readme files, or other technical documentation 
Spreadsheets TSV, CSV, ODS (Open Document Spreadsheet), XLSX (Microsoft Excel)
Video MKV (Matroska), MOV (QuickTime), MPEG-2, MPEG-4 (.mp4)
3D Models and Design DXF (AutoCAD Drawing Interchange File), X3D (Extensible 3D), 3MF (3D Manufacturing Format), OBJ (Wavefront), STL, PLY 

For additional information about preservation file formats, see: