Finding and reusing datasets for research, learning and teaching

Subject Specific & Funder Open Access Data Repositories

There are numerous subject specific repositories - too many to list individually. 

If you need help identifying subject specific repositories  have a looks at r3data.org Registry of Research Data Repositories. This will allow you to search or browse for subject specific repositories.

  • The UK Data Service is  the UK’s largest collection of social, economic and population data resources. It also holds data from studies funded by the Economic and Social Research Council (ESRC) as well as other Social Science datasets.  Recordings on all their webinar training sessions are available from YouTube UK Data Service  
  • Natural Environment Research Council (NERC) has 5 data centres links available to each from the NERC Environmental Data Service (EDS). You can also search for datasets using the NERC Data catalogue Service.
  • Biotechnology and Biological Sciences Research Council (BBSRC) has a number of resources that are available for the bioscience community to use. These include links to data sharing and data resources.  

 Some Useful Dataset Search Engines

No single search engine will find datasets in every repository. Each will be limited to the repositories indexed, and there will be some overlap. The search functionality is not usually as good as within a subject repository but these tools can be useful if you do not have a specific subject data repository.

  • Mendeley Data search. Although Mendeley is a Data repository itself, it also indexes many of the OA data repositories and is useful and easy to search.
  • Data Citation Index –this is part of Web of Science, one of the Library subscribed databases, and access is available from the Library Resources A-Z . A list of the repositories indexed and searchable is available. Web of Science have a useful guide on how to use the Data citation Index and there is also a short You Tube tutorial - Getting Started with the Data Citation Index . 
  • OpenAire - a European project supporting the Open Science movement. Includes data sets and on the home page has a useful browse by United Nations Sustainable Development Goals functionality.
  • CESSDA Data Catalogue  Datasets come from over 20 European countries. Good for searching and finding European social science data ( UK data also included)
  • Google Dataset Search and also Google and other search engines will find some datasets.
  • Eu Open Data Portal - search for public data published by the EU institutions, agencies and other bodies.

Interdisciplinary Open Access Data Repositories  

 Abertay Datasets

Pure acts as a catalogue for Abertay datasets  and includes datasets listed in Pure as well as datasets deposited in external data repositories.  You can find these listed on  Abertay's Pure Portal. Most universities will have an Open access repository so if there is a specific researcher or University that you are interested in, check their institutional OA repository for datasets. 

 Re-using Datasets

When reusing data, you must always check the licence and reuse terms, and provide a citation for the derivative dataset. The licence associated with the original dataset or datasets will determine how the new dataset can be used and shared. If you have used more than one dataset and these have different licences, then usually the resultant dataset must be licenced under the most restrictive licence of the original datasets. More information on licence stacking is provided below.

Commonly Used Dataset Licences

1.Creative Common Licences

  • Attribution, CC BY
  • waiver of copyright, CC0

The two CC licences above are the most commonly used CC dataset licences used and have the least number of restrictions on reuse..

  • No derivative Licences e.g., CC BY-ND 
  • Non Commercial  e.g.,  CC BY- NC, 
  • Share Alike e.g.  CC BY-SA , 

These more restricted licences do limit how the data can be reused.

 Additional information is available from the Creative Commons website.

2. Other Open Licences

The Data Curation Centre have a How to license Research Data guide which you may find helpful. if you are reusing datasets with different licences, the OpenMinTed Licence Compatibility Matrix will help you check if the licences are compatible. TheOpen Minted licence tool is still in Beta so use it as a guide and if you need any advice re licences, email repository@abertay.ac.uk 

Citing Datasets

If you use other people's data to generate a new dataset to you must always acknowledge the original data source just as you would cite any other primary source such as an article, book, image or website. Citing datasets is important  because it:

  • gives credit to the creators of the original datasets;
  • promotes data reproducibility;
  • allows funders to track the impact and reuse of data arising form their funded studies;
  • encourages the citation of data as the normal rather than the exception;
  • increases the discoverability.

Usually the elements of a data citation will include:

  • Author/Creators;
  • Year of publication;
  • Title;
  • Publisher (i.e. the data repository);
  • Version or edition if applicable;
  • Permanent identifier for dataset e.g  doi.

 An example of a dataset deposited in the Dryad repository  is: 

 Cuthill, Innes C. et al. (2017), Data from: Optimizing countershading camouflage, Dryad, Dataset, https://doi.org/10.5061/dryad.rd47f

More information on citing data is available in How to Cite Data Sets and link to Publications available from the Data Curation Centre.

Last modified by

Related Pages

Back to top