Subtleties of research data publication

Until recently I thought about research data publication along a single continuum that I’d call Private/Mediated/Public.

  • Private research data is completely private. No third party even knows it exists.
  • Mediated research data is publicly advertised with a metadata record, but third parties need to apply or register for access. This is most appropriate for sensitive research data.
  • Public research data is fully public and openly downloadable and hopefully openly licensed. No application or registration required.

I realise now that there is a second, temporal dimension that needs to be added, based on where a project is in the research lifecycle. This second dimension adds complexity to mediated access as data needs to be shared in different ways at different stages in a project:

  • During peer review
  • During an embargo period
  • After completion of project

During peer review some publishers (e.g. Nature) now require data to be made available to peer reviewers. Since we all love blind peer review, the access must be anonymous! This means we can’t ask for peer reviewers to register. We also don’t necessarily want to publicly release data. Perhaps a mechanism that produces a long, cryptic, hard-to-guess URL (a la Dropbox sharing links) would be a good solution for this.

Sometimes a significant dataset is collected and can support multiple research projects at multiple locations. There is a plan to openly share the data eventually, but we want to tightly control access during an embargo period. This means not only do we want to know exactly who is accessing the data, but we also need a licence in place so that everybody knows how the dataset may or may not be used. Ideally, this licence will have some sort of termination clause that recognises the data will eventually be made fully open.

Finally, we get back to what I originally considered the only kind of mediated access. After a project is completed, a dataset might be considered sensitive and can never be openly shared. We need to maintain the mediated access with a licence that outlines how the dataset may or may not be used.

I suppose I could create a diagram, but I don’t think that way. Has someone else already created one?

What other kind of data publication scenarios are there? I have specifically avoided talking about sharing data with close collaborators – researchers at different institutions that are working on the same project because it’s not “publishing”, per se, even though it bears similarities to my scenario of sharing data during an embargo period.

2 thoughts on “Subtleties of research data publication”

  1. Hi Mathias, thanks for your thoughts. I need to make a report on this topic soon. Considering all kinds of ways to provide “access options”, and the implications for a data repository on a technical, legal and business-model level. (The later is especially exciting when a data-repository exists (read: funded) by the grace of the “open” ideology.)

    1. If you base the repository around a researcher-centric workflow, rather than a librarian-centric workflow, the required access options increase quite dramatically. Or rather, they become much more complex. Researchers have different needs at different stages of their work, and those of us that claim to support researchers need to work with that.

      The good thing is that my institution is only at the beginning of the data journey and we have an opportunity to take this into account as we attempt to build a sophisticated data solution for our researchers.

