What is Data Organization?
Data organization refers to the structure of project directories to increase file findability, enable logical grouping, and chronological sorting to aid in the facilitation of data analysis.
Data organization considers:
- All of the data and material connected to a project and the relationships between them
- How to consistently and meaningfully name project data, materials, folders, etc
- What others need to know to understand how the project is organized (i.e. creating a README text with guidelines)
File Management
Before generating project files, plan and implement a file and folder organizational structure to keep all the research information and materials associated with the project together.
Best practices for file naming:
- Come up with a naming convention and be consistent. Create a README text file with guidelines.
- Don't use special characters ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ' " |
- Use capitals (example: MyFileName) or underscores (example: my_file_name) instead of spaces
- Make sure your file names reflect their content (think about what someone else would need to know in order to find and make sense of your files)
- Use date format YYYYMMDD or YYYY-MM-DD (ISO 8601)
- If the system does not version your files, include a version number. Use "v" followed by at least 2 digits for proper sorting (example: v01, v02... v10, v11)
- Keep file names under 32 characters
File organization - things to keep in mind:
- How would you and others expect the data to be organized? Will you be able to find things in 6 months? A year?
- Think about intuitive groupings (for example: data type, experiment number or project, date, animal/plant)
- Keep folder levels to 3 or 4
Directory Example: Organized by File Type
- Dataset.ProjectName
- Code
- Step.1
- Step.2
- Data
- Processed
- Raw
- Results
- Figure.1
- Figure.2
- Models
- readme.txt
- Code
Directory Example: Organized by Analysis
- Dataset.ProjectName
- Figure.1
- Code
- Data
- Results
- Figure.2
- Code
- Data
- Results
- Table.1
- Code
- Data
- Results
- readme.txt
- Figure.1
Still have questions or interested in personalized recommendations based on your project? Contact us or email us directly at rdm@tufts.edu.