0

When I put https://s3.amazonaws.com/nyc-tlc/trip+data/fhv_tripdata_2015-01.csv into a browser, I can download a file no problem. But when I say,

wget.download('https://s3.amazonaws.com/nyc-tlc/trip+data/fhv_tripdata_2015-01.csv', out='data/')

I get a 404 error. Is there something wrong with the format of that URL?

This is not a duplicate of HTTP Error 404: Not Found when using wget to download a link. wget works fine with other files. This appears to be something specific to S3 which is explained below.

Bob Wakefield
  • 3,739
  • 4
  • 20
  • 30

1 Answers1

2

The root cause is a bug in S3, as described here: https://stackoverflow.com/a/38285197/4323

One workaround is to use the requests library instead:

r = requests.get('https://s3.amazonaws.com/nyc-tlc/trip+data/fhv_tripdata_2015-01.csv')

This works fine. You can inspect r.text or write it to a file. For the most efficient way, see https://stackoverflow.com/a/39217788/4323

John Zwinck
  • 239,568
  • 38
  • 324
  • 436