API
Firstly, you need to know how you're accessing the data
The problems you've cited are only valid if you just access someone's site through HTTP (with something like CURL). As you instinctively know, this is highly inefficient & will likely get your IP blocked for continuous access
A far better way to access data (from any reputable service) is to use their API. This is as true of S3 as Twitter, Facebook, Dropbox, etc:
AWS-SDK
#GemFile
gem "aws-sdk-core", "~> 2.0.0.rc2"
#config/application.rb
Aws.config = {
access_key_id: '...',
secret_access_key: '...',
region: 'us-west-2'
}
#config/initializers/s3.rb
S3 = Aws::S3.new
S3 = Aws.s3
Then you'll be able to use the API resources to help retrieve objects:
#controller
# yields once per response, even works with non-paged requests
s3.list_objects(bucket:'aws-sdk').each do |resp|
puts resp.contents.map(&:key)
end
CORS
If you were thinking of xhr
ing into a server, you need to ensure you have the correct CORS permissions to do so
Considering you're wanting to use S3, I would look at this documentation to ensure you set the permissions correctly. This does not apply to the API or an HTTP request (only Ajax)