Finding and reusing datasets for research, learning and teaching
Subject Specific & Funder Open Access Data Repositories
There are numerous subject specific repositories - too many to list individually.
If you need help identifying subject specific repositories have a looks at r3data.org Registry of Research Data Repositories. This will allow you to search or browse for subject specific repositories.
- The UK Data Service is the UK’s largest collection of social, economic and population data resources. It also holds data from studies funded by the Economic and Social Research Council (ESRC) as well as other Social Science datasets. Recordings on all their webinar training sessions are available from YouTube UK Data Service
- Natural Environment Research Council (NERC) has 5 data centres links available to each from the NERC Environmental Data Service (EDS). You can also search for datasets using the NERC Data catalogue Service.
- Biotechnology and Biological Sciences Research Council (BBSRC) has a number of resources that are available for the bioscience community to use. These include links to data sharing and data resources.
Some Useful Dataset Search Engines
No single search engine will find datasets in every repository. Each will be limited to the repositories indexed, and there will be some overlap. The search functionality is not usually as good as within a subject repository but these tools can be useful if you do not have a specific subject data repository.
- Mendeley Data search. Although Mendeley is a Data repository itself, it also indexes many of the OA data repositories and is useful and easy to search.
- Data Citation Index –this is part of Web of Science, one of the Library subscribed databases, and access is available from the Library Resources A-Z . A list of the repositories indexed and searchable is available. Web of Science have a useful guide on how to use the Data citation Index and there is also a short You Tube tutorial - Getting Started with the Data Citation Index .
- OpenAire - a European project supporting the Open Science movement. Includes data sets and on the home page has a useful browse by United Nations Sustainable Development Goals functionality.
- CESSDA Data Catalogue Datasets come from over 20 European countries. Good for searching and finding European social science data ( UK data also included)
- Google Dataset Search and also Google and other search engines will find some datasets.
- Eu Open Data Portal - search for public data published by the EU institutions, agencies and other bodies.
Interdisciplinary Open Access Data Repositories
- Zenodo - listed here as an interdisciplinary repository but also holds data from projects funded by European Commission and includes UK projects in addition to other non funded datasets
- Figshare -
- Mendeley Data
- Open Science Framework
- Dryad repository -mainly has datasets linked to publications. All licenced CC0
Pure acts as a catalogue for Abertay datasets and includes datasets listed in Pure as well as datasets deposited in external data repositories. You can find these listed on Abertay's Pure Portal. Most universities will have an Open access repository so if there is a specific researcher or University that you are interested in, check their institutional OA repository for datasets.
When reusing data, you must always check the licence and reuse terms, and provide a citation for the derivative dataset. The licence associated with the original dataset or datasets will determine how the new dataset can be used and shared. If you have used more than one dataset and these have different licences, then usually the resultant dataset must be licenced under the most restrictive licence of the original datasets. More information on licence stacking is provided below.
Commonly Used Dataset Licences
1.Creative Common Licences
- Attribution, CC BY
- waiver of copyright, CC0
The two CC licences above are the most commonly used CC dataset licences used and have the least number of restrictions on reuse..
- No derivative Licences e.g., CC BY-ND
- Non Commercial e.g., CC BY- NC,
- Share Alike e.g. CC BY-SA ,
These more restricted licences do limit how the data can be reused.
Additional information is available from the Creative Commons website.
2. Other Open Licences
- Open Government Licence
- Open Data Commons Attribution License (ODC-By)
- Open Data Commons Open Database License (ODbL)
- Public Domain Dedication and License (PDDL).
- Commonly used Software licences and summaries
The Data Curation Centre have a How to license Research Data guide which you may find helpful. if you are reusing datasets with different licences, the OpenMinTed Licence Compatibility Matrix will help you check if the licences are compatible. TheOpen Minted licence tool is still in Beta so use it as a guide and if you need any advice re licences, email firstname.lastname@example.org
If you use other people's data to generate a new dataset to you must always acknowledge the original data source just as you would cite any other primary source such as an article, book, image or website. Citing datasets is important because it:
- gives credit to the creators of the original datasets;
- promotes data reproducibility;
- allows funders to track the impact and reuse of data arising form their funded studies;
- encourages the citation of data as the normal rather than the exception;
- increases the discoverability.
Usually the elements of a data citation will include:
- Year of publication;
- Publisher (i.e. the data repository);
- Version or edition if applicable;
- Permanent identifier for dataset e.g doi.
An example of a dataset deposited in the Dryad repository is:
Cuthill, Innes C. et al. (2017), Data from: Optimizing countershading camouflage, Dryad, Dataset, https://doi.org/10.5061/dryad.rd47f
More information on citing data is available in How to Cite Data Sets and link to Publications available from the Data Curation Centre.