1

In Rails, how can I make an http request to a page, like "http://google.com" and set the response to a variable?

Basically I'm trying to get the contents of a CSV file off of Amazon S3: https://s3.amazonaws.com/datasets.graf.ly/24.csv

My Rails server needs to return that content as a response to an AJAX request.

  1. Get S3 bucket
  2. Access the file and read it
  3. Render its contents (so the ajax request receives it)

A few questions have suggested screen scraping, but this sounds like overkill (and probably slow) for simply taking a response and pretty much just passing it along.

Don P
  • 60,113
  • 114
  • 300
  • 432

2 Answers2

1

API

Firstly, you need to know how you're accessing the data

The problems you've cited are only valid if you just access someone's site through HTTP (with something like CURL). As you instinctively know, this is highly inefficient & will likely get your IP blocked for continuous access

A far better way to access data (from any reputable service) is to use their API. This is as true of S3 as Twitter, Facebook, Dropbox, etc:


AWS-SDK

#GemFile
gem "aws-sdk-core", "~> 2.0.0.rc2"

#config/application.rb
Aws.config = {
    access_key_id: '...',
    secret_access_key: '...',
    region: 'us-west-2'
}

#config/initializers/s3.rb
S3 = Aws::S3.new
S3 = Aws.s3

Then you'll be able to use the API resources to help retrieve objects:

#controller
# yields once per response, even works with non-paged requests
s3.list_objects(bucket:'aws-sdk').each do |resp|
  puts resp.contents.map(&:key)
end

CORS

If you were thinking of xhring into a server, you need to ensure you have the correct CORS permissions to do so

Considering you're wanting to use S3, I would look at this documentation to ensure you set the permissions correctly. This does not apply to the API or an HTTP request (only Ajax)

Community
  • 1
  • 1
Richard Peck
  • 76,116
  • 9
  • 93
  • 147
  • 1
    Thanks Rich! Great explanation. the AWS-SDK is definitely what I'll be using. I've used CORS for other ajax requests to S3 and that works great. – Don P Mar 05 '14 at 10:01
  • No problem - if you need any more help please let me know! – Richard Peck Mar 05 '14 at 10:03
  • This is the better choice even if its not quite what was asked. – iheggie Mar 05 '14 at 10:21
  • Which is the better choice @iheggie? – Don P Mar 05 '14 at 10:22
  • Using the API is always the better choice, although my answer does not provide details on how to solve your question specifically – Richard Peck Mar 05 '14 at 10:25
  • There is probably no one "best choice" - each choice has advantages/disadvantages: – iheggie Mar 05 '14 at 12:03
  • 1
    I would use the API if you are also doing manipulation of the data (put all the aws manipulation in a separate service class), and the open-uri if that is the only place you are accessing it. The strong coupling of using the API has the advantage that someone else (hopefully) has tested it and thought of edge cases and the disadvantage that you have introduced another dependency to track. Using http GET is simpler. I am unsure if the rate limits are the same between the API and the straight http get, or even if the API uses a http get inside to do its get. – iheggie Mar 05 '14 at 12:27
  • Whichever you pick, I suggest adding a test that the external service returns a result in the expected format (eg that there is data, that it can be parsed as CSV, and that it has the expected # or rows and columns, and actual data values if the specific files wont change), and then mock out the external service for all other tests (so you don't hit the external service too much, and it becomes clear if problems are at your end or the other for tests that utilise data from the external service) – iheggie Mar 05 '14 at 12:36
0

To do as you asked:

Alternatively decode the csv file in rails and pass a json array of arrays back:

off the top of my head it should be something like:

def get_csv
  url = 'http://s3.amazonaws.com/datasets.graf.ly/%d.csv' % params[:id].to_i
  data = open(url).read
  # set header here
  render :text => data
end
Community
  • 1
  • 1
iheggie
  • 2,011
  • 23
  • 23