6

When i try to upload a large csv file to CKAN datastore it fails and shows the following message

Error: Resource too large to download: 5158278929 > max (10485760).

I changed the maximum in megabytes a resources upload to

ckan.max_resource_size = 5120

in

/etc/ckan/production.ini

What else do i need to change to upload a large csv to ckan.

Screenshot: Error: Resource too large to download: 5158278929 > max (10485760)

Stack User 5674
  • 1,548
  • 1
  • 20
  • 44
  • Can you be more specific about exactly what you did to try to upload the file to the datastore? For example, are you using the datapusher here? Or the datastorer? Or did you mean that you tried to upload it to the filestore? Also, what version of CKAN are you using? – Sean Hammond Apr 23 '14 at 09:40
  • We are trying to upload the csv to datastore. Using ckan version 2.2 – Stack User 5674 Apr 23 '14 at 09:52
  • @SeanHammond please see the screenshot of the error in updated question .. Please help me find out the error. – Stack User 5674 Apr 24 '14 at 06:25

1 Answers1

4

That error message comes from the DataPusher, not from CKAN itself: https://github.com/ckan/datapusher/blob/master/datapusher/jobs.py#L250. Unfortunately it looks like the DataPusher's maximum file size is hard-coded to 10MB: https://github.com/ckan/datapusher/blob/master/datapusher/jobs.py#L28. Pushing larger files into the DataStore is not supported.

Two possible workarounds might be:

  1. Use the DataStore API to add the data yourself.

  2. Change the MAX_CONTENT_LENGTH on the line in the DataPusher source code that I linked to above, to something bigger.

Sean Hammond
  • 12,550
  • 5
  • 27
  • 35
  • Thanks sean. I changed the MAX_CONTENT_LENGTH vale to a big value ( 5 GB ) then the following error occured Error: [u' File "/usr/lib/ckan/datapusher/lib/python2.7/site-packages/apscheduler/scheduler.py", line 512, in _run_job\n retval = job.func(*job.args, **job.kwargs)\n', u' File "/usr/lib/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 261, in push_to_datastore\n f = cStringIO.StringIO(response.read())\n', u' File "/usr/lib/python2.7/socket.py", line 358, in read\n buf.write(data)\n', u"MemoryError('out of memory',)"] – Stack User 5674 Apr 24 '14 at 09:30
  • Your computer (the one that you're running datapusher on) ran out of memory while trying to upload the file. Looks like the datapusher keeps the whole file in memory while downloading it from CKAN and/or CKAN keeps the whole file in memory while delivering it for download. I guess "streaming" downloads that would avoid this kind of error are not supported. So if you want to push a 5GB file into the DataStore using the DataPusher, you need more than 5GB of memory. – Sean Hammond Apr 25 '14 at 09:02
  • 2
    You could still use the DataStore API that I linked to in my answer above to add your file to the DataStore bit by bit, instead of all at once. – Sean Hammond Apr 25 '14 at 09:02