Preparing your data for publication [Feb. 1, 2013, 7:36 a.m.]
Once you've chosen your licenses it's time to communicate them, and information about your open collection to the public. And communication really is the key to a successful open data policy!
For starters you'll need to add a dedicated field or tag to your data to describe the rights status of your content to make clear under which conditions u provide your data. For metadata a general disclaimer on the download page could suffice. Just make sure users can easily understand what they can and cannot do with the data. When applying Creative Commons licenses always make sure that the link to the official text is included in the statement (e.g. http://creativecommons.org/licenses/by-sa/3.0/).
A good way to communicate all different aspects of your collection and corresponding rights is in a datablog. What this is and how you could make one is described in the next task. What else can you do to make life easier for reusers? Here's some tips we've put together:
- Always make your data (content and / or metadata) available on your own website. This way it’s clear that you are the original provider. Another advantage is that you will often have a better overview of the access to and re-use of your data than if you only provide access to it elsewhere.
- You can provide both content (e.g. images, videos), and the information about this content as well (metadata). The metadata is almost always stored on a different place than the content. If you provide both content and metadata, then make sure that it’s clear where they can be found. Ideally, add a separate field in the metadata with a URL to the content, for example the URL where the images or videos can be found.
- If your organisation has an online shop where users can order content, then it is important for end users that you clearly mark your Open Cultural Dataset content as such: open. It is confusing for re-users to see a shopping cart next to a photo which you provide as open data elsewhere. If you don’t make re-use conditions explicit, this can eventually lead to less re-use.
- Make sure there’s an explanation or news section on your website about the sort of open cultural dataset(s) your institution provides. For this, you can use the text of your data blog.
- The preferences for dataformats vary among developers and other re-users. Some are happy with a simple .csv or .txt dump of metadata, others prefer access to a full live API, where you can choose to access data in different ways (e.g. JSON, .xml). Whatever your options or limitations are, at least make sure you always clearly describe what people can find in your metadata fields in your data blog, and provide re-users with as many options as possible to approach, download and search through your data. If you have an API, then describe which standard you’re using and where re-users can find more information about it.
- Describe clearly when the latest changes to your dataset were made. This can be done in your datablog or - even better- in your metadata. If changes occur regularly, provide an update incrementally or consider offering multiple versions of your dataset.
- If you provide open content, it’s recommended to make it available in the highest resolution possible. This will stimulate re-use! Note that some developers also like to have the option to work with a smaller resolution, because this is less ‘heavy’. So ideally, you have content available in different resolutions.
- For re-use on Wikipedia, the following metadata fields are the most important: name of the creator, title, object type, description, creation date, measurements, current location, internal ID, license. Make sure that at least these fields are properly documented.
- If your content is labeled with an unique category on the Wikimedia Commons (for example Category:Media_from_Open_Beelden), you can get statistics about re-use of your content on Wikipedia (some examples here). These categories are assigned by the Wikimedia community itself.