Finding and reusing datasets for research, learning and teaching
There is no single search engine that will find datasets in every repository. The main search engine is the Data Citation Index but even if you use this search you will need to search other search engines or repositories.
Dataset Search Engines
- Data Citation Index –this is part of Web of Science, one of the Library subscribed databases, and access is available from the Library Resources A-Z . A list of the repositories indexed and searchable is available. Web of Science have a useful guide on how to use the Data citation Index and there is also a short You Tube tutorial - Getting Started with the Data Citation Index .
- Google Dataset Search
Interdisciplinary Open Access Data Repositories
Below is a list of the most commonly used multidisciplinary data repositories.
Subject Specific Open Access Data Repositories
There are numerous subject specific repositories - too many to list individually. Listed below are the 2 repositories specified by UKRI funders ESRC and NERC, along with repositories holding datasets arising from research funded by the European Commission. There is also a link to a registry of subject specific repositories.
- The UK Data Service is the UK’s largest collection of social, economic and population data resources. It also holds data from studies funded by the Economic and Social Research Council (ESRC). A recorded webinar Finding and Accessing data in the UK Data Archive is available from You Tube.
- Natural Environment Research Council (NERC) has 5 data centres-searchable via the NERC Data catalogue Service.
- OpenAire - search for datasets funded by the European Commission.
- Eu Open Data Portal - search for public data published by the EU institutions, agencies and other bodies.
- A list of subject specific repositories is available from r3data.org.
Abertay Datasets
A list of datasets deposited in external data repositories is available from Abertay's Pure Portal .
Re-using Datasets
When re-using data, you must always check the licence and reuse terms, and provide a citation for the derivative dataset. The licence associated with the original dataset or datasets will determine how the new dataset can be used and shared. If you have used more than one dataset and these have different licences, then usually the resultant dataset must be licenced under the most restrictive licence of the original datasets. More information on licence stacking is provided below.
Commonly Used Dataset Licences
1.Creative Common Licences
- Attribution, CC BY
- No derivative Licences e.g., CC BY-ND or CC BY-NC-ND
- Non Commercial e.g., CC BY- NC, CC BY-NC-ND
- Share Alike e.g. CCBY-SA , CC BY-NC-SA
- waiver of copyright, CC0
Additional information is available from the Creative Commons website.
2. Other Open Licences
- Open Government Licence
- Open Data Commons Attribution License (ODC-By)
- Open Data Commons Open Database License (ODbL)
- Public Domain Dedication and License (PDDL).
- Commonly used Software licences and summaries
The Data Curation Centre have a How to Licence Research Data guide which you may find helpful. if you are re-using datasets with different licences, the OpenMinTed Licence Compatibility Matrix will help you check if the licences are compatible.
If you have any questions regarding licences, please email repository@abertay.ac.uk
Citing Datasets
You should always cite your re-use of datasets just as you would cite any other primary source such as an article, book, image or website. Citing datasets is important because it:
- gives credit to the creators of the original datasets;
- promotes data reproducibility;
- allows funders to track the impact and reuse of data arising form their funded studies;
- encourages the citation of data as the normal rather than the exception;
- increases the discoverability.
Usually the elements of a data citation will include:
- Author/Creators;
- Year of publication;
- Title;
- Publisher (i.e. the data repository);
- Version or edition if applicable;
- Permanent identifier for dataset e.g o doi.
An example fo a dataset deposited in the Dryad repository is:
Cuthill, Innes C. et al. (2017), Data from: Optimizing countershading camouflage, Dryad, Dataset, https://doi.org/10.5061/dryad.rd47f
More information on citing data is available in How to Cite Data Sets and link to Publications available from the Data Curation Centre.