DTU Findit

How to use DTU Findit for text and datamining

DTU Findit offers datasets to be used for text and datamining. DTU Findit has a limited set of open access files which can be used freely for text and datamining.

Start by searching for your subject in DTU Findit, then limit your search to “Publications for Text and Datamining” under “Advanced”.

Then choose Export metadata in the Export Function:

From the next page you will be asked to accept the Terms of use. After confirming, you will be able to download the files.

We have included the FAQ here as well:

Frequently Asked Questions

What is the origin of the metadata?

The metadata stems from various commercial and non-commercial sources with which DTU Library have obtained a datafeed. Typically publishers (commercial), databases (commercial) and repositories (non-commercial).

What can I do with the exported metadata?

DTU Library have obtained certain usage rights for the metadata. These include that DTU users (students and staff) can use the metadata for any analysis- and mining activities related to their DTU work, studies and/or research.

Can I load the metadata into my own repository?

If a repository is a meaningfull tool for you wrt. your analysis/mining activities and if accesss to this repository is limited to yourself (and other DTU staff/students working with you), then yes. Under no circumstances are you allowed to make these data available to an audience outside DTU - through a repository or other means.

What are the Open Access TDM-prepared datafiles and what are their origin?

The DTU Library collection of OA TDM-prepared datafiles stems from sources that collects Open Access versions of scientific articles to which anyone has the right to do anything with the files without asking anyone for permission. It is typically files in PDF format. DTU Library prepares for TDM by extracting the cleartext from these PDF files into a text file. For each article fed into the collection, the collection will make available both the PDF and the text version. Currently, the collection is based on a single source: PubMedCentral.

How do I download the datafiles from the collection of Open Access TDM-prepared datafiles?

In the metadata export files (CSV), if a publication has a datafile associated from the collection of Open Access TDM-prepared datafiles, a link (URL) to that file will feature in a designated column in the line for that publication. Following the link will download the text version of the publication. Adding the extension ".pdf" to the link and then following it will download the pdf version. The link contain an access token generated as part of the download of the metadata export file. The token is generated based on various information, including knowledge about you, the search you made, etc. The token is time-limited and will expire 24 hours after it was generated.