From Data to Information to Knowledge.
Data is not Information - in this task we talk about how to translate one into the other.
Data, when collected and structured suddenly becomes a lot more useful. Let’s do this in the table below.
Color | White |
Category | Sport - Golf |
Condition | Used |
Diameter | 43mm |
Price (per ball) | $0.5 (AUD) |
But each of the data values is still rather meaningless by itself. To create information out of data, we need to interpret that data.
Let’s take the size: A diameter of 43mm doesn’t tell us much. It is only meaningful when we compare it to other things. In sports there are often size regulations for equipment. The minimum size for a competition golf ball is 42.67mm. Good, we can use that golf ball in a competition. This is information. But it still is not knowledge. Knowledge is created when the information is learned, applied and understood.
Unstructured vs. Structured data
A plain sentence - “we have 5 white used golf balls with a diameter of 43mm at 50 cents each” - might be easy to understand, but for a computer this is hard to understand. The above sentence is what we call unstructured data. Unstructured has no fixed underlying structure - the sentence could easily be changed and it’s not clear which word refers to what exactly. A table such as the one we did above would be more structured.
Computers are inherently different from humans. It can be exceptionally hard to make computers extract information from certain sources. Some tasks that humans find easy are still difficult to automate with computers. For example, interpreting text that is presented as an image is still a challenge for a computer (have you ever signed up to a website and had to type some words which were presented to you as an image? This is because it’s so hard for computers to do so and so easy for you - proving that you’re not a machine). If you want your computer to process and analyse your data, it has to be able to read and process the data. This means it needs to be structured and in a machine-readable form.
One of the most commonly used formats for exchanging data is CSV. CSV stands for comma separated values. The same thing expressed as CSV can look something like:
“quantity”, “color”, “condition”, “item”, “category”, “diameter (mm)”, “price per unit (AUD)”
5,”white”,”used”,”ball”,”golf”,43,0.5
This is way simpler for your computer to understand and can be read directly by spreadsheet software. Note that words have quotes around them: This distinguishes them as text (string values in computer speak) - whereas numbers do not have quotes. It is worth mentioning that there are many more formats out there that are structured and machine readable.
Task: Think of the last book you read. What data is connected to it and how would you make it structured data?