Storing vs. Archiving
There is a difference between storing and sharing your data in a repository, and archiving and preserving it for the long term. Archiving means that your data is actively cared for, to ensure it is maintained and usable into the future. Some data repositories also act as archives, so make sure you learn about whether your repository of choice offers these services. An archive will:
- Save multiple copies of your files in multiple locations
- Perform regular fixity checks to ensure your data does not degrade or become corrupted over time
- Migrate your data to the most current standard file formats
- Collect and preserve your documentation
What can you do to ensure your data is usable over time?
- Make sure data collection, analysis, and variables are well-documented
- Use an open file format that is unencrypted and uncompressed to ensure interoperability between software applications and sustainable access for the long term
- Consider using a format that supports embedded metadata
Preservation File Formats
Type of Content | Preferred File Formats |
---|---|
Audio | WAV, BWF (Broadcast WAVE), FLAC |
Databases | SQLite (.db, .db3, .sqlite, .sqlite3), TSV, CSV |
Documents and text | PDF/A, PDF, ODF (Open Document Format), DOCX (Microsoft Word), RTF, TXT |
Images |
|
Maps, GIS, and Geospatial Data | SHP (Shapefile), GDB (ESRI Arc Geodatabase), GPKG (OGC GeoPackage), GeoTIFF, GML (Geography Markup Language) |
Software | Uncompiled source code and compiled runtime (executable) file, with any release notes, readme files, or other technical documentation |
Spreadsheets | TSV, CSV, ODS (Open Document Spreadsheet), XLSX (Microsoft Excel) |
Video | MKV (Matroska), MOV (QuickTime), MPEG-2, MPEG-4 (.mp4) |
3D Models and Design | DXF (AutoCAD Drawing Interchange File), X3D (Extensible 3D), 3MF (3D Manufacturing Format), OBJ (Wavefront), STL, PLY |
For additional information about preservation file formats, see: