0

I have big csv files on my S3. I want to import their data without download(copying) them to my tmp folder on heroku. Plus i dont want to load whole file in memory. Can you guys suggest me how can i do that. Some thing like getting data chunk by chunk or anything like that.

Thanks in advance.

AMBasra
  • 969
  • 8
  • 24
  • 1
    I think this would only be possible if Amazon had an api where you could ask for a specific set of lines from a file, which AFAIK doesn't exist. I think your best option is to copy the file onto your server and then read it in one line at a time (various CSV libraries let you do this). This way you can avoid having it all in memory. I don't think you can avoid having it in memory AND avoid saving the file locally. – Max Williams Dec 09 '15 at 13:26
  • 1
    Both `curl` and `wget` can write the stream to `STDOUT`, which you could presumably read directly from a pipe. Have you tried that approach? – Michael - sqlbot Dec 09 '15 at 13:59
  • 1
    @Michael-sqlbot aha, great idea. This post has some methods to do this. http://stackoverflow.com/questions/1342583/manipulate-a-string-that-is-30-million-characters-long/1342760#1342760 – Max Williams Dec 09 '15 at 14:24

1 Answers1

1

Ok, I thought of a way to do this, which is insanely inefficent and pretty stupid. But, if you are determined to not save the file on your server then here's a way.

You add code to your app which accepts some data (eg a line, or lots of lines worth) in some form, and then creates the records accordingly. Deploy this.

Then, on your LOCAL machine, save the file. Write a script which reads the file in (again, a line at a time is best to avoid memory issues), converts it into the format needed for action you wrote, and then send it as a GET or POST request to your production site. It will need to make lots of requests as it churns through the file.

Like I say, this is really stupid and a little insane: you really should just save the file on your server.

Max Williams
  • 32,435
  • 31
  • 130
  • 197
  • yes after reading your comments i am saving the file on my system. was just curious if i can read the file directly. thankyou for all the help. – AMBasra Dec 11 '15 at 03:24
  • You can, i think, using @Michael - sqlbot's streaming CURL, but it's really not worth the effort compared to just saving the file. – Max Williams Dec 14 '15 at 10:31