Save and parse CSV file from URL

Question

I am looking for an implementation that would allow me to download a CSV file from a browser (via a URL), to a point where I can open that file manually and view its contents in CSV form.

I have been doing some research and can see that I should use the IO, CSV or File classes.

I have a URL that looks something like:

"https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"

From what I have read I have:

href = page.find('#csv-download > a')['href']
csv_path =  "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"
require 'open-uri'
download = open(csv_path, ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE)
IO.copy_stream(download, 'test.csv')

This actually outputs:

Which tells me that I have successfully got the data?

When downloading the file, the contents are just

#<StringIO:0x00000003e07d30>

Would there be any reason for this?

It's where to go from here, could anyone point me in the right direction please?

IO.copy_stream returns the bytes written and it seems like you successfully saved the file. Both IO and File are valid methods. — Oliver Zeyen, Nov 17 '15 at 20:04
So I just need to read the file now with the CSV class methods available? — Richlewis, Nov 17 '15 at 20:13

Oliver Zeyen · Accepted Answer · 2015-11-17T20:34:25.147

2

This should read from remote, write and then parse the file:

require 'open-uri'
require 'csv'
url =  "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"

download = open(url)
IO.copy_stream(download, 'test.csv')
CSV.new(download).each do |l|
   puts l
end

edited Nov 17 '15 at 20:34

answered Nov 17 '15 at 19:33

Oliver Zeyen

783
5
7

thanks for your help so far, I have updated my question and was wondering if you could take a look please – Richlewis Nov 18 '15 at 08:15
`open(url)` returns a `StringIO` object. To see it's content you have to use `download.read`. I suggest to read the `IO` documentation – Oliver Zeyen Nov 18 '15 at 09:45
This is way too cumbersome. – the Tin Man Nov 18 '15 at 17:05
But it covers the csv part of the question which your example is lacking. Still I like your examples. – Oliver Zeyen Nov 18 '15 at 20:25

the Tin Man · Answer 2 · 2015-11-18T19:55:29.440

If all you want to do is read a file and save it, it's simple. This untested code should be all that's required:

require 'open-uri'

CSV_PATH = "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"

IO.copy_stream(
  open(
    CSV_PATH,
    ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE
  ),
  'test.csv'
)

OpenURI's open returns an IO stream, which is all you need to make copy_stream happy.

More typically you'll see the open, read, write pattern. open will create the IO stream for the remote document and read will retrieve the remote document and write will output it to a text file on your local disk. See their documentation for more information.

require 'open-uri'

CSV_PATH = "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"

File.write(
  'test.csv',
  open(
    CSV_PATH,
    ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE
  ).read
)

There might be a scalability advantage to using copy_stream for huge files that potentially wouldn't fit into memory. That'd be a test for the user.

score 0 · Answer 3 · answered Jan 19 '23 at 00:18

0

Here is a one-liner I use. Of course if the file is huge - I might want to stream or download it first, but this works in 99% of cases, just fine.

require 'open-uri'
require 'csv'

csv_data = CSV.readlines(open(download_url), headers: true)

answered Jan 19 '23 at 00:18

konung

6,908
6
54
79

Save and parse CSV file from URL

3 Answers3