Does Supplementary Material Data Disappear?

Most research projects produce large amounts of data. Journals refer to this data as Supplementary Information (SI) or Supplementary Material. This can include detail of methods, results data, or audio and video files, for example.

When a paper is published, there usually isn’t space to include all of this information. To stop the reader from drowning in data, Supplementary Information (SI) is given separately. SI is peer-reviewed and relates to the conclusions of the article, so it is important that readers do have access to it.

Normally, journals provide SI in the form of PDF files. While these may be skipped by some readers, they are vital to others. Researchers hoping to use the methods or data in their own studies need access to these files. Also, the SI is the key to the reproducibility of a study.

The Disappearing Data Problem

There are downsides to giving SI in the form of a PDF. It is fairly common for researchers to find broken hyperlinks when they try to access the data. This seriously limits how useful the paper is to other researchers.

“I’ve had multiple instances where the supplementary material has gone missing,” said one researcher at the University of Pittsburgh, USA. Other scientists agree. “Many (not all) journals regard supplementary data as a pain in the neck,” said a researcher at the University of Cambridge, UK.

There can be other problems with these files, as well as broken links. Some SI files are so large it can be difficult to find the information you need within the files. Others may be in a format that requires specific software.

If a researcher cannot access SI, they may not be able to use the data in their own work. The SI can even be vital for understanding the paper itself. Some journals place the whole of the results in the SI, rather than in the paper itself.

Clearly, it is important that SI can be easily and reliably accessed. So, what is the solution?

Supplementary Material in Repositories

Researchers are now turning to alternative ways to store and access SI. Some choose to post SI on their own websites. While this is a good start, it does mean that the task of making sure the SI is available remains with the researcher. Also, some journals – including Nature – state that this alone is not enough. They require that SI must also be posted on the journal website.

A good option is to post SI in a repository. Repositories are online archives for data. There are two types of repository. The first is the general type, which can hold scientific data from any subject. Examples are Dryad and Zenodo. As well as data, these repositories can hold open access articles and even software.

The second type is subject-specific repositories. For example, there is GenBank for genetic sequences or OpenNeuro for brain imaging data. Authors can choose the repository that is best suited to their work.

Why Use Repositories?

Repositories have a number of benefits over providing SI as a PDF. First, repositories can offer storage and access for a huge amount of data, in one place. This makes it easier for researchers to find and use SI. Second, repositories tag their files with a digital object identifier (DOI). Researchers can then link to the DOI directly from their published article. This ensures that authors get credit for their work. It also makes it easier for readers to access the SI.

Another advantage of repositories is that they allow most file formats. This means that researchers can submit their data in the format that they feel works best. This is unlike many journals. Journals might, for example, say that all SI has to be submitted in a PDF, which is unsuitable for some types of data. “Speaking personally as an author, I much prefer [using repositories], because it lets me control how I manage my supplementary data files,” said one researcher.

In traditional journals, the copyright for published data is often kept by the journal. In contrast, most repositories allow free access to data. This is in line with the current move towards Open Access. Most data placed in repositories is published with a Creative Commons licence.

What Do Publishers Want?

Repositories might work better for authors and readers, but what about publishers?

Actually, some publishers are also turning to repositories. For example, the Microbiology Society, which publishes seven journals, recommends that authors submit their SI files to repositories. Other journals go further. The publisher F1000 no longer accepts SI. Instead, authors must submit their data to an approved repository. This reduces workload for the publisher, as well as making it easier for readers to find the information.

Some researchers feel that dealing with SI has never been a priority for publishers. This is why, for example, links to SI are often broken if the publisher changes their systems or updates their website. “Journals can change very quickly,” said one publishing director. This should mean that making the move to using repositories is not a big issue for journals.

Repositories: A Win-Win for Researchers and Publishers

Repositories seem to offer a better option for both publishers and authors. So why isn’t everyone using them? It could simply be a matter of time. A 2017 survey found that publishing SI alongside the article is still a bit more common than using a repository. However, many researchers believe that once the benefits of repositories are better known, they will become the norm. This is “what we should be striving for,” said one researcher.

Repositories are not only useful for SI. Some researchers are also using pre-print repositories for papers that are not yet published. Others are using them for Open Access papers. With so many benefits, it might be time to say goodbye to supplementary material and hello to repositories.

Should SI be given with a paper or posted in a repository? What are your experiences of using repositories? Share your thoughts in the comments below.

You can also listen to this article as an audio recording.

