Finding and reusing datasets for research, learning and teaching

There is no single search engine that will find datasets in every repository. The main search engine is the Data Citation Index but even if you use this search you will need to search other search engines or repositories.

Dataset Search Engines

Interdisciplinary Open Access Data Repositories 

Below is a list of the most commonly used multidisciplinary data repositories. 

Subject Specific Open Access Data Repositories

There are numerous subject specific repositories - too many to list individually. Listed below are the 2 repositories specified by UKRI funders ESRC and NERC, along with repositories holding datasets arising from research funded by the European Commission. There is also a link to a registry of subject specific repositories.

  • The UK Data Service is  the UK’s largest collection of social, economic and population data resources. It also holds data from studies funded by the Economic and Social Research Council (ESRC).  A recorded webinar  Finding and Accessing data in the UK Data Archive is available from You Tube. 
  •  Natural Environment Research Council (NERC)  has 5 data centres-searchable via the  NERC Data catalogue Service.
  • OpenAire - search for datasets funded by the European Commission.
  • Eu Open Data Portal - search for public data published by the EU instituions,agencies and other bodies.
  • A list of subject specific repositories is available from r3data.org.

 Abertay Datasets

A list of datasets deposited in external data repositories is available from Abertay's Pure Portal .

 Re-using Datasets

When re-using data, you must always check the licence and reuse terms, and provide a citation for the derivative dataset. The licence associated with the original dataset or datasets will determine how the new dataset can be used and shared. If you have used more than one dataset and these have different licences, then usually the resultant dataset must be licenced under the most restrictive licence of the original datasets. More information on licence stacking is provided below.

Commonly Used Dataset Licences

1.Creative Common Licences

  • Attribution, CC BY
  • No derivative Licences e.g., CC BY-ND or CC BY-NC-ND
  • Non Commercial e.g.,  CC BY- NC, CC BY-NC-ND
  • Share Alike e.g.  CCBY-SA , CC BY-NC-SA
  •  waiver of copyright, CC0 

 Additional information is available from the Creative Commons website.

2. Other Open Licences

The Data Curation Centre have a How to Licence Research Data guide which you may find helpful. if you are re-using datasets with different licences, the Open MinTed Licence Compatibility Matrix will help you check if the licences are compatible.

If you have any questions regarding licences, please email repository@abertay.ac.uk 

Citing Datasets

You should always cite your re-use of datasets just as you would cite any other primary source such as an article, book, image or website. Citing datasets is important  because it:

  • gives credit to the creators of the original datasets;
  • promotes data reproducibility;
  • allows funders to track the impact and reuse of data arising form their funded studies;
  • encourages the citation of data as the normal rather than the exception;
  • increases the discoverability.

Usually the elements of a data citation will include:

  • Author/Creators;
  • Year of publication;
  • Title;
  • Publisher (i.e. the data repository);
  • Version or edition if applicable;
  • Permanent identifier for dataset e.g o doi.

 An example fo a dataset deposited in the Dryad repository  is: 

 Cuthill, Innes C. et al. (2017), Data from: Optimizing countershading camouflage, Dryad, Dataset, https://doi.org/10.5061/dryad.rd47f

More information on citing data is available in How to Cite Data Sets and link to Publications available from the Data Curation Centre.

Back to top