Tufts Dataverse

Go to Tufts Dataverse

Tufts Dataverse is a place for Tufts researchers to share research data publicly. Tufts Dataverse is housed within Harvard Dataverse, an open, generalist repository maintained by Harvard University. It is free to use. Currently, Tufts Dataverse has a per-file limit of 2.5 GB and a total storage capacity of 1 TB. 

Why use Tufts Dataverse?

Tufts Dataverse accepts research from any discipline from any Tufts researchers. Uploading a dataset allows you to:

  • Receive a persistent identifier (DOI) for your dataset
  • License your data with Creative Commons or custom licenses
  • View the number of downloads on your dataset
  • Add rich metadata to your dataset, allowing your dataset to be found via search engines like Google Dataset Search
  • Create a Collection for your research group to showcase your group's datasets in one location

What data can go in Tufts Dataverse?

Complete datasets

Data deposited in Dataverse should be publishable research data. (This doesn't mean it needs to be associated with an article or publication, just that the dataset is in a "finished" state). For data that is still in progress or part of an ongoing collaborative project, consider using Open Science Framework. 

Any discipline

Data of any discipline can be deposited in Tufts Dataverse. However, if there is a discipline-specific repository for your data type, we recommend depositing there. (See Share Data for more on discipline-specific repositories.)

No identifiable info

Personally-identifiable information cannot be deposited in Tufts Dataverse, with very few exceptions. See Harvard Dataverse's General Terms of Use. If you have questions about whether your data is identifiable, please reach out to rdm@tufts.edu

Create an Account

To create your account with Tufts Dataverse, go to Tufts Dataverse and log in with your Tufts credentials. 

If you are working with a Tufts research group but do not have a Tufts username, you can still create an account with a non-Tufts email, ORCID ID, GitHub, or Google account. However, your account will not be affiliated with Tufts University by default. 

Request a Collection

Collections are groupings of datasets that can be created for research groups, labs, departments, or research projects. Collections have a unique URL (not a DOI) and can be customized with a logo. Search facets and additional metadata fields can also be customized for each Collection. 

See an example of a Collection. Collections can also be embedded into web pages, as seen here

Reach out to rdm@tufts.edu to request a Collection. Researchers requesting a Collection must meet with Tufts librarians first and receive training on administering their Dataverse Collection.

Upload a Dataset

Data can be uploaded to the main Tufts Dataverse, or to a specific collection. You must be logged in to deposit data. 

To upload a dataset, navigate to the desired location (either the main page of Tufts Dataverse, or the page for your collection). Click "Add Data", then choose "New Dataset" from the drop-down menu.

After uploading, your dataset will remain unpublished until a Tufts Dataverse administrator has approved the submission. We approve datasets after a brief review for quality. We aim to contact users within 1-2 business days after submission, but publication may be delayed if changes are required. We recommend initiating a deposit at least 2 weeks before you need the data to be published. 

We also have a self-paced module to help walk you through the process of depositing data into Dataverse - try the module or contact us if you need assistance.

Add Metadata

Adding metadata helps others find your data more easily. The more metadata you provide, the more options there are for your data to be discovered. 

We require the following metadata:

Required Field Format Description Additional Guidance
Title Free text The title of the dataset. If this is a replication dataset, we recommend: “Replication Data for:” + the title of the paper.
Author Name Last, First M. The person or organization that created the dataset.  Add additional authors with the (+ Add) button.
Author Affiliation Search & select from drop-down list The organization/entity affiliated with the author. If logged in with your Tufts credentials, this will automatically be filled in as Tufts University. You can search the drop-down list for additional organizations, e.g., Tufts Medical Center.
Point of contact Last, First M.; include Affiliation The person that Dataverse users can contact with questions about the dataset.  If the contact person is also an author, make sure the affiliation is the same as listed under “Author Affiliation”.
Description Free text A summary describing the purpose, nature, and scope of the dataset.  This field should provide enough details so that other researchers can understand (1) what data is contained within; and (2) the methodology by which the data was collected/generated.
Subject Select from drop-down The area of study relevant to the dataset.  You may select multiple subject areas if your research is multi-disciplinary.
Keywords Free text (with an optional link to controlled vocabulary) Key term(s) that describe an important aspect of the dataset. If there is a related publication, you should use the same keywords as the publication. You do not have to use a controlled vocabulary; consider using them if your related publication has keywords/subject terms from a controlled vocabulary. Examples include Library of Congress Subject Headings (LCSH) or Medical Subject Headings (MeSH). Not sure how to choose a term? Reach out to rdm@tufts.edu and we’ll help you identify the best keywords for your dataset.

We recommend adding these citation metadata: 

Recommended Field Format Description Additional Guidance
Author Identifier XXXX-XXXX-XXXX-XXXX A unique digital identifier. We recommend using ORCID iD. Enter the number only of your ORCID iD (do not include “https://orcid.org/” at the beginning)
Related Publication Free text The article or report that uses this dataset.  Under “Citation”, include a citation to the article in your discipline’s preferred citation style. Under “URL”, include its permanent identifier (e.g., a DOI) in URL form (beginning with https://). If there is no related publication, include a brief description of the methodology used to create the dataset in the Description field instead.
Funding Information * Search & select from drop-down list The agency which provided the dataset’s financial support.  If available, include the grant number under “Identifier”.
Production Location * Free text The country (and optionally, state and city) where the data was collected.  Use a more specific location if relevant to your research (e.g., “Somerville, MA, United States”). If multiple locations, list separated by semicolons.
Date of Collection * YYYY-MM-DD The dates during which data was collected or generated (e.g., when samples were being analyzed, surveys were being answered, etc.). 

For example, date of collection may refer to when samples were being analyzed or surveys were being answered. 

If you know the year but not the dates, use YYYY-01-01.

Time Period * YYYY-MM-DD The time period that the data refer to.

Use when working with historical data. For example, if records from 1910 were digitized in 2020, the time period would be 1910. 

If you know the year but not the dates, use YYYY-01-01.

*These metadata fields must be added after the dataset is initially uploaded (but before publication) by selecting Edit Dataset.

For help with filling out additional metadata, contact rdm@tufts.edu.  

Restrict Access

Access to individual files can be restricted. When the “Request Access” feature is enabled, users must submit a request to receive your data. We encourage data not be restricted when possible. Reasons for restriction may include when sensitive information has been identified as acceptable for sharing in consultation with Tufts Dataverse administrators.  

You can also place an embargo on individual files (for instance, if required by a journal). You must choose an embargo end date, on which the embargo will end and the data files will be made available. Note that in order for an embargo to take effect, the dataset must be published first, making the metadata for the dataset visible (but not the embargoed files themselves). 

Policies & Review Process

Relevant Policies

When depositing in Tufts Dataverse, researchers must abide by the Harvard Dataverse General Terms of Use as well as all Tufts University policies, including the Information Stewardship Policy and the Policy on Rights and Responsibilities with Respect to Intellectual Property.

Personally-Identifiable Information (PII)

No personally identifiable information should be uploaded to Tufts Dataverse. See Harvard Dataverse’s Terms of Use for their definition of identifiable information. Schedule a consultation with us to determine whether your data should be uploaded.

If unpublished (draft) datasets are identified as containing personally identifiable information, the dataset authors will be contacted and asked to revise their dataset before publication.

If datasets that contain personally identifiable information are published, Tufts Dataverse administrators will deaccession such datasets. Responsible parties will be contacted by Tufts Dataverse administrators. Their Dataverse privileges may be minimized or revoked, and the Office of Information Security will be contacted for follow-up.

Review Process

Tufts Dataverse administrators review datasets submitted to the general Tufts Dataverse prior to publication, as well as datasets published within a Collection. Datasets will remain unpublished until a Tufts Dataverse administrator has approved the submission. We aim to contact users within 1-2 business days after submission, but publication may be delayed if changes are required. 

We recommend initiating a deposit at least 2 weeks before you need the data to be published.

The following criteria are used to approve dataset publication. Users may be asked to revise their datasets if they do not meet these criteria:

  • The dataset contains no personally identifiable information, unless an exception has been identified in consultation with Tufts Dataverse administrators.
  • The Description field provides enough details so that other researchers can understand (1) what data is contained within; and (2) the context in which the data was collected/generated.
  • If the dataset is related to a publication, the "Related Publication" field is filled. If the dataset is not related to a publication, a brief description of the methodology used to create the dataset is included (e.g., in the description or in a README file).
  • The dataset contains metadata sufficient for those looking to reuse the data. Preferably, the dataset includes a README file, codebook, or data dictionary which define variables and/or describe relevant files.
  • Files are restricted as necessary (e.g., due to journal sharing requirements, data embargo, etc.).
  • The dataset author has defined a license/data use agreement that is appropriate for their needs.