Skip to Main Content
Research Guides@Tufts

Research Data Management @ Hirsh Health Sciences Library

Overview

Starting January 25, 2023, all NIH grant submissions for data-producing projects (e.g., most R grants) require the addition of a Data Management and Sharing Plan (DMS Plan). For help with your plan, you can reach out to your data management librarian; make sure to contact us at least a few weeks before your grant is due!

  • The first thing you will want to do before writing your DMS Plan is to check if the DMS Plan is part of peer review for your grant. If so, in what way(s)?
    • Even if it isn't peer reviewed, peer reviewers can still comment.
  • The worst case scenario if your DMS Plan is NOT part of peer review is that the NIH may want to fund you but you must update your DMS Plan first. If this happens, reach out to one of your data management librarians and we can help you!
  • The DMS Plan has no formatting guidelines (yet), but I WOULD recommend using the NIH's format page to make sure you include all of the information they need.
  • DO NOT include hyperlinks or URLs in your DMS Plan! Make sure to name databases and other websites accurately so they are easy to find.
  • Remember: This is not meant to be a static document. Things can change! You SHOULD be specific, but understand that the specifics can change over time.
  • DO plan to stick to what you outline in this document; failing to do so could affect future funding.
  • For MOST DMS Plans, there will be 6 sections. You may see 7 sections on some example plans; this is only required for specific grants (NIMH grants, for example), but almost all plans will end at Section 6.

Section 1: Data Type

  • In section 1A, use specific numbers for amount of data/number of samples/specimens, even if you're not sure how many specimens you will have yet. Aim for the correct order of magnitude - you can change the specifics later.
  • Include how much computer memory your data will take up, usually in GB. Again, this can be approximate. If you're not sure, find some files of the same type and see how big they tend to be. Multiply this by the approximate number of samples to get the total amount.
  • Section 1B should be brief - just name each type of data and whether those data will be shared. If the data will be restricted in any way, briefly describe why. More details will go into sections 4 and 5.
  • Section 1C asks about metadata - this is data about your data, including details like dates and times, specimen details (species, cell or tissue type, proteins, etc), temperature, etc. Some of these data may be stored in the data files, but some will not. How will you keep track of the information that is not stored in your data files? How will other members of your team know what to keep track of?
    • Often this is included in a README that can be shared with your team. (Contact us if you want help with this!)
  • Another type of metadata often used in scientific research are explanations of tabular data. If you keep data in a spreadsheet, you may use shorthand or nicknames for column titles and you may include numbers but not units. You can include a separate README file that explains the column title names and units of any numbers in your spreadsheet.
  • Section 1C should also include information about protocols, which you should also share. What format will they be in?
  • The README, metadata, and protocol files mentioned in Section 1C should be shared in a format that is accessible to as many people as possible. The best way is to share these as txt or PDF/A files. It is tempting to write these up in Microsoft Word, but not everyone has access to the Microsoft Office suite! If you really want to write it up with formatting, try saving the files as .rft (rich text format) instead.

Section 2: Related Tools, Software, and/or Code

  • In this section, you will explain what file formats you will use to share your data and any software necessary to open them.
  • Best practice is to favor open file types (csv, txt, pdf) over proprietary (xlsx, doc) - here is a list of open file types, or you can ask us for help!
  • REMEMBER: Microsoft Office is proprietary! If you will be sharing Excel files, mention Excel here. Best practice is to save/share as both Excel and csv files.
  • Code: If you will be writing code, try to favor open coding languages like Python and R over proprietary languages like Matlab whenever possible.
  • You should share code if possible - usually somewhere like GitHub. It's fine to mention your Github username, but do NOT link to your Github account.

Section 3: Standards

  • Here is where you will list any standard expectations around data in your field. How are data usually collected in your field? For example, DNA/RNA sequencing data is usually shared in FASTQ format.
  • If you are unsure, try looking up your field at FAIRsharing.org
  • If there are no consensus standards, how will you make sure that data collection is consistent in YOUR lab, even if many different people work on this project?
    • Access to a README or data dictionary, for example.
  • NIH likes researchers to use and contribute to Common Data Elements - try to mention those here if you will use them.
    • Common Data Elements are useful for collecting patient info, for example.

Section 4: Data Preservation, Access, and Associated Timelines

  • It is best to share your data in a subject-specific repository, if one exists. You can also go with a generalist repository like Tufts Dataverse.
  • Cloud storage (like Box) is NOT a repository. For something to be a repository, someone must be able to find your dataset without your help; i.e., it is browsable and indexed by Google.
  • Most repositories will mint a DOI or other persistent identifier (PID, Handle) for your data.
  • Data should be made available at time of publication (no later!) or by the end of the funding period for unpublished (usually negative) results.
  • How long data will be made available depends on the repository, but usually 5-10+ years

Section 5: Access, Distribution, or Reuse Considerations

  • If you have data that will be restricted in any way, what LEGAL, ETHICAL, or TECHNICAL factors require that restriction?
    • If your argument does not fall under these categories, it will likely not be accepted.
    • LEGAL examples: HIPAA, FERPA, copyright
    • ETHICAL examples: (some) Deidentified patient data, locations of endangered species
    • TECHNICAL examples: Extremely large datasets (on the scale of dozens to hundreds of TB; in lieu of sharing the data, could share how it was acquired, e.g. for text mining studies), difficult to digitize data (e.g., 3D teeth scans may be more useful than teeth x-rays, but the burden to acquire and share them is much higher)
  • For section 5B, data can be completely withheld (HIPAA-protected info, for example), completely free, or controlled (people can only access after certain criteria are met). Access to deidentified patient data may be controlled, for example. List any controlled data here, and how it will be controlled (for example, data will be shared on Tufts Dataverse but only available to individuals who reach out to a team to ask for access)
    • Prefer a team to just a PI, if possible, to prevent bottlenecks.
    • At a minimum, a librarian should also have access to your dataset so it is not orphaned if you leave Tufts
  • Only fill out 5C if you have human research participants. How will you ensure their private information is protected? For example, Tufts Box is HIPAA compliant. How will data be deidentified?
  • If you are working with human subjects, information about informed consent can be very helpful in this section. Will the informed consent allow data sharing? If so, to what extent?

Section 6: Oversight of Data Management and Sharing

  • Oversight is ALMOST ALWAYS the PI's responsibility. Tufts does NOT routinely offer oversight of data management plans, unless you have explicitly been told otherwise. There are rare exceptions, but the vast majority of grants will have the PI as the responsible party.
  • Can also include something here about how the plan will be updated at least once a year or when circumstances merit an update.