Open Data on the Web
The Dutch Open Culture Data network recently gave a presentation at the Open Data on the Web (ODW) in London. The event was organised by the World Wide Web Consortium (W3C), the Open Data Institute and the Open Knowledge Foundation. The focus of the event was on how open data can be used as a key resource to increase transparency and efficiency, and its economic potential.
The event took place on 23-24 Apil, as a continuation of the Using Open Data event held last year that brought together many of the best and brightest in the open data field. This year was again a great success, with a venue bursting from the seams with people from all over the world and different interests in open data. Below, we highlight some of the great presentations. You can find all the ODW papers and more information about the event here.
Sand or solid as a rock?
John Sheridan, Head of Legislation Services at the UK National Archives kicked off the meeting with the important message that for open data projects to be successful, some key preconditions need to be in place. First of all, it is very important that an organisation needs to have a long-term goal for open data, a sustainability plan, and the ambition to offer data to end users as best as possible – updated, easily accessible and fully institutionally supported. Also, support from the open data community at large and legislative changes are crucial. If these changes do not come about, all open data projects and applications will be built on sand, and not on rock, an important metaphor used by Sheridan which came back over the course of the two days.
One specific topic related to the sand metaphor was echoed a number of times by people from various realms of the open data world. Namely, the problem of data dumps provided for one-off hackathons, which are abandoned immediately after the hackathon is over and are never updated again by the suppliers. A great variety of experts highlighted this topic: a Linked Data & Open Data project manager (Hayo Schreijer), a cognitive scientist and (semantic) web researcher at the Rensselaer Polytechnic Institute (Alvaro Graves), a data integration company (Bart van Leeuwen of Netage) and also Open Culture Data itself.
The reason all of them gave for seeing one-off data dumps as a problem, is that developers and companies need a real incentive to invest their valuable time and resources to build something with open data. They need more than a ‘sandy’ data dump which in many cases lacks the solid basis needed for them to invest in open data, namely real commitment of the supplier to keep their data updated and really put openness at the core of their organisation.
Usability and discoverability
There were diverse opinions on what the best approach is for making open data available in order to facilitate usability and discoverability. There were plenty representatives with a passion for Linked Open Data (LOD), including Sir Tim Berners-Lee, no less! They agreed on the benefits LOD brings in connecting and disambiguating various data sets. Others, like Mark Birbeck from Sidewinder Labs agreed that the Linked Open Data principle is great, but that starting out with consistent code and a simple way of providing data would be enough for some use cases.
This latter case was excellently presented by Rufus Pollock from the Open Knowledge Foundation. He talked about the ‘Frictionless data’ project, which aims to “make it radically easier to make data used and useful, [and] make it as simple as possible to get the data you want into the tool of your choice.” It works by having a data provider create a package in which a so-called Simple Data Format (apublishing format adopted by the OKFn) combination is stored, which consists of a JSON-based schema with information about the dataset, and the dataset itself (always a .csv file).
Whichever way a data provider chooses to go – full LOD or Simple Data Format – some guidelines should at least be adhered to. One very pragmatic yet thorough checlist was presented by Pascal Romain and Elie Sloïm: the Open Quality Standards checklist with 72 good open data practices. This list is divided in topics, such as API, licenses, privacy and metadata, and is a very useful tool to get an insight in what is needed to offer truly good usability and discoverability.
The business of open (culture) data
Lotte Belice Baltussen took part in a discussion on behalf of Open Culture Data in a panel on the business of open data. In it, she discussed that ‘business’ has different meanings for different organisations. For developers, app competitions can be an opportunity to make a name for themselves, but since there usually only a few prizes to be won, this means many that put in effort don’t get anything in return. And even if you do win a prize, this is often not followed up with the creation of a sustainable business plan on either the side of the organisation giving out the reward or the developers themselves.
Thus, many apps with high potential created in app competitions don’t have successful an app competition afterlife. This issue is being explored in the project Apps for Europe, in which ‘business lounges’ are set up that bring together open data startups with experts and investors in order to have business and sustainability at the core of the development of new apps based on open data.
In the case of heritage organisations, their business is first and foremost their public mission: providing access to their collections. This is also one reason for these organisations to open up data, since this facilitates the spreading and re-use of collections, and increasing the channels to end users. (see also Baltussen et al., 2013) This, however, often conflicts with the fact that they are at the same time expected to monetize their digitised assets, an important point that also came up during the recent GLAMWiki conference.
Deutsche Nationalbibliothek: an open culture data case study
In another session, Lars Svensson of the Deutsche Nationalbibliothek talked about dedicating their metadata to the public domain under CC0 since 2010. This was a risky choice, seeing that they generated €750,000 selling this data in 2010 alone. The DNB adopted a what Svensson called a ‘mixed business model’; in which they provide a simple set of title data and RDF data with almost complete information in the authority and bibliographic records under the CC0 dedication. However, they still charge for the complete bibliographic data in the MARC data format for recent records (after December 2011). This model will most likely persist until 2015, after which all DNB data will become available under CC0.
The reason for moving to CC0 and losing substantial revenue, is that the DNB wanted to spread their data across the cultural heritage domain and make it more reusable and accessible. A key factor in this decision was the change of Europeana’s Data Exchange Agreement a few years ago, which requires partners to make their metadata available under CC0 in order to become part Europeana. The DNB took part in Europeana workshops in which the (perceived) benefits and risks of open data were explored.
After these workshops and through talking with other Europeana data providers and legal experts, it became clear to them that the benefits of going with CC0 for their data outweigh the costs. The reason is that opening up once, allows the DNB to easily make their data available on various platforms such as Europeana and Wikipedia, which creates new channels to end users and makes reuse and building upon the data much easier, both for the DNB itself and others. As Director General of the DNB, Dr. Elisabeth Niggemann said when open licensing of data through Europeana was supported by European libraries in 2011:
“Providing data under an open licence is key to putting cultural institutions like our national libraries at the heart of innovations in digital applications. Only that way can society derive full social and economic benefit from the data that we’ve created to record Europe’s published output over the past 500 years.” (more information here)
All in all, this case study is a great example of the important role Europeana can and does play in stimulating cultural organisations to open up by demonstrating the benefits through well-informed discussions and workshops, and by being a central hub and powerful force for aggregating Europe’s cultural heritage.
Shaping the future open data agenda
Last year, the event focus was on examples of how open (governmental) data is used. The W3C again organised a great and inspiring event this time round, in the more general theme of realising the promise of open data was explored. The outcomes will help prioritise W3C’s agenda in the area of data on the web. Besides this, it brought together people from all a great variety of disciplines who have one thing in common: finding the best ways to make data openly available, findable and usable. Hopefully, a third edition will be held in 2014 that is equally inspiring.
A special thanks to W3C’s Phil Archer for organising everything so fabulously, and to Google for hosting the event at the Google Campus.
Open Data on the Web links:
Dit bericht verscheen eerder op het R&D blog van Beeld en Geluid.