This course will become read-only in the near future. Tell us at if that is a problem.

Clean the data

You have to convert the data into a format is allowing. Help from here:

Task Discussion

  • Ilpo   Oct. 9, 2012, 3:27 p.m.

    "4. No blank rows or cells: Data imported into OpenSpending should be fairly de-normalized: while there may be references to external code sheets/master data, each row should contain all the information required to construct the resulting item. Columns (particularly classifications) should have a value for each row; they will not automatically “fill down”."

    There is about 2200 row in this data, and at first glance I didn't see any empty rows. 

    What would be the right (or the easiest) way to check out if there is empty rows?

  • Ilpo   Oct. 9, 2012, 3:21 p.m.

    "3. Rows should contain only one type of information: i.e. one budget line (or “fact”). In many files, you will see that any individual row contains data for multiple years, as well things such as both budgeted and actual spend. One individual row contains a maximum of one time period, and contrasts have been created within columns. Note that formatting data for OpenSpending often means creating many more rows than were found in the original document."

    I guess this is right with my data. There is only data from year 2011. And I guess it's actually spent money... I'm still not sure.

  • Ilpo   Oct. 9, 2012, 2:59 p.m.

    So, now I have the data, which can be checked out from here: github:

    And I have not much idea about this, but let's follow the steps in here:

    "1. OpenSpending takes CSV files: OpenSpending is very flexible with regard to structure, for example, it does not specify the column order, however the content of the columns must be standardised."


    OK, I have the csv-file. First step accomplished! :)

    "2. OpenSpending requires there to be one header row in the file: This is what the software will look for to identify the names of your columns. All other rows are treated as data rows."

    I guess, there is one header row in this csv I linked earlier. I'm still not sure about it, 'cause I don't understand much about the structure of csv-files. Can anyone help me with this?