2

I am looking for an implementation that would allow me to download a CSV file from a browser (via a URL), to a point where I can open that file manually and view its contents in CSV form.

I have been doing some research and can see that I should use the IO, CSV or File classes.

I have a URL that looks something like:

"https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"

From what I have read I have:

href = page.find('#csv-download > a')['href']
csv_path =  "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"
require 'open-uri'
download = open(csv_path, ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE)
IO.copy_stream(download, 'test.csv')

This actually outputs:

 2684

Which tells me that I have successfully got the data?

When downloading the file, the contents are just

#<StringIO:0x00000003e07d30>

Would there be any reason for this?

It's where to go from here, could anyone point me in the right direction please?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Richlewis
  • 15,070
  • 37
  • 122
  • 283

3 Answers3

2

This should read from remote, write and then parse the file:

require 'open-uri'
require 'csv'
url =  "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"

download = open(url)
IO.copy_stream(download, 'test.csv')
CSV.new(download).each do |l|
   puts l
end
Oliver Zeyen
  • 783
  • 5
  • 7
0

If all you want to do is read a file and save it, it's simple. This untested code should be all that's required:

require 'open-uri'

CSV_PATH = "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"

IO.copy_stream(
  open(
    CSV_PATH,
    ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE
  ),
  'test.csv'
)

OpenURI's open returns an IO stream, which is all you need to make copy_stream happy.

More typically you'll see the open, read, write pattern. open will create the IO stream for the remote document and read will retrieve the remote document and write will output it to a text file on your local disk. See their documentation for more information.

require 'open-uri'

CSV_PATH = "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"

File.write(
  'test.csv',
  open(
    CSV_PATH,
    ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE
  ).read
)

There might be a scalability advantage to using copy_stream for huge files that potentially wouldn't fit into memory. That'd be a test for the user.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
0

Here is a one-liner I use. Of course if the file is huge - I might want to stream or download it first, but this works in 99% of cases, just fine.

require 'open-uri'
require 'csv'

csv_data = CSV.readlines(open(download_url), headers: true)
konung
  • 6,908
  • 6
  • 54
  • 79