2

What are the guidelines to follow such that data can be previewed nicely on CKAN Data Preview tool? I am working on CKAN and have been uploading data or linking it to external websites. Some could be previewed nicely, some not. I have been researching online about machine-readability and could not find any resources pertaining to CKAN that states the correct way to structure data such that it can be previewed nicely on CKAN. I hope to gather responses from all of you on the do's and don'ts so that it will come in useful to CKAN publishers and developers in future.

For example, data has to be in a tabular format with labelled rows and columns. Data has to be stored on the first tab of the spreadsheet as the other tabs cannot be previewed. Spreadsheet cannot contain formulas or macros. Data has to be stored in the correct file format (refer to another topic of mine: Which file formats can be previewed on CKAN Data Preview tool?)

Thanks!

Community
  • 1
  • 1
kean23
  • 105
  • 6

3 Answers3

3

Since CKAN is an open source data management system, it does not have a specific guidelines on the machine readability of data. Instead, you might want to take a look at the current standard for data openness and machine readability right here: http://5stardata.info

UK's implementation of CKAN also includes a set of plugins which help to rate the openness of the data based on the 5 star open data scheme right here: https://github.com/ckan/ckanext-qa

1
  1. Check Data Pusher Logs - When you host files in the CKAN Data Store - the tool that loads the data in provides logs - these will reveal problems with the format of data.
  2. Store Data Locally - Where possible store the data locally - because data stored elsewhere has to go through the proxy process (https://github.com/okfn/dataproxy) which is slower and is of course subject to the external site maintaining availability.
  3. Consider File Size and Connectivity - Keep the file size small enough for your installation and connectivity that it doesn't time out when loading into the CKAN Data Explorer. If the file is externally hosted and is large and the access to the file is slow ( poor connectivity or too much load) you will end up with timeouts since the proxy must read the entire file before it is presented for preview. Again hosting data locally should mean better control over the load on compute resource and ensure that the data explorer works consistently.
  4. Use Open File Formats - If you are using CKAN to publish open data - then the community generally holds that is is best to publish data in open formats (e.g. CSV, TXT) rather than proprietary ones (eg. XLS). Beyond increasing access to data to all users - and reducing the chance that the data is not properly structured for preview - this has other advantages. For example, it is harder to accidentally publish information that you didn't mean to.
  5. Validate Your Data -Use tools like CSVKIT to check that your data is in good shape.
user468648
  • 197
  • 3
  • 13
1

The best way to get good previewing experiences is to start using the DataStore. When viewing remote data CKAN has to use the DataProxy to do its best to guess data types and convert the data to a form it can preview. If you put the data into the DataStore that isn't necessary as the data will already be in a good structure and types will have been set (e.g. you'll know this column is a date rather than a number).

Rufus Pollock
  • 2,295
  • 21
  • 20