This course will become read-only in the near future. Tell us at community.p2pu.org if that is a problem.

Data Sources



There are three ways of getting hold of data:

1) Finding data - this involves searching and finding data that has already been released

2) Getting hold of more data  - asking for ‘new’ data from official sources e.g. through Freedom of Information requests. Some data is hidden away in websites that are hard to navigate - but don’t give up! This data can be liberated with what datawranglers call ‘scraping’. An excellent source for data collected this way is the scraperwiki.

3) Collecting data yourself - This means gathering data and entering it into a database or a spreadsheet - whether you work alone or collaboratively.


In this module we’ll focus on (1) Finding data that already has been released. Several sources frequently release data for public use:

1) Government

In recent years governments have begun to release some of their data to the public. Many governments host special (open) government data platforms for the data they create. For example the UK government started data.gov.uk to release their datasets. Similar data portals exist in the US, Brazil and Kenya - and in many other countries! Does your country have an open data portal (Datacatalogs.org is a good starting point)?

2) Organisations

Other sources of data are large organisations. The WorldBank and the World Health Organization for example regularly release reports and data sets.

3) Science

Scientific projects and institutions release data to the scientific community and the general public. Open data is produced by NASA for example, and many specific disciplines have their own data repositories, some of which are open. More and more initiatives exist trying to provide access to already published data (e.g. Dryad)

To help people to find data, projects like the Open Access Directory’s data repository list or the Open Knowledge Foundation’s datahub.io have been started. They aim either to collect data sources, or collect together different data sets from various sources.

Task: Are there any data sources we missed?

[embed:Invalid Url]

Now that you have an overview of some of the key concepts related to data, it’s time to start hunting for your own! To begin with, let’s return to the question that we posed at the beginning of this module

TASK: Find the datasets that would allow you to answer the questionHow does healthcare spending influence life expectancy?’.

If you are not sure where to start - check out our next task.

Task Discussion