I have a CSV file in AWS S3 that I'm trying to open in a local temp file. This is the code:
s3 = Aws::S3::Resource.new
bucket = s3.bucket({bucket name})
obj = bucket.object({object key})
temp = Tempfile.new('temp.csv')
obj.get(response_target: temp)
It pulls the file from AWS and loads it in a new temp file called 'temp.csv'. For some files, the obj.get(..)
line throws the following error:
WARN: Encoding::UndefinedConversionError: "\xEF" from ASCII-8BIT to UTF-8
WARN: /Users/.rbenv/versions/2.5.0/lib/ruby/2.5.0/delegate.rb:349:in `write'
/Users/.rbenv/versions/2.5.0/lib/ruby/2.5.0/delegate.rb:349:in `block in delegating_block'
/Users/.rbenv/versions/2.5.0/lib/ruby/gems/2.5.0/gems/aws-sdk-core-3.21.2/lib/seahorse/client/http/response.rb:62:in `signal_data'
/Users/.rbenv/versions/2.5.0/lib/ruby/gems/2.5.0/gems/aws-sdk-core-3.21.2/lib/seahorse/client/net_http/handler.rb:83:in `block (3 levels) in transmit'
...
/Users/.rbenv/versions/2.5.0/lib/ruby/gems/2.5.0/gems/aws-sdk-s3-1.13.0/lib/aws-sdk-s3/client.rb:2666:in `get_object'
/Users/.rbenv/versions/2.5.0/lib/ruby/gems/2.5.0/gems/aws-sdk-s3-1.13.0/lib/aws-sdk-s3/object.rb:657:in `get'
Stacktrace shows the error initially gets thrown by the .get
from the AWS SDK for Ruby.
Things I've tried:
When uploading the file (object) to AWS S3, you can specify content_encoding
, so I tried setting that to UTF-8:
obj.upload_file({file path}, content_encoding: 'utf-8')
Also when you call .get
you can set response_content_encoding
:
obj.get(response_target: temp, response_content_encoding: 'utf-8')
Neither of those work, they result in the same error as above. I would really expect that to do the trick. In the AWS S3 dashboard I can see that the content encoding is indeed set correctly via the code but it doesn't appear to make a difference.
It does work when I do the following, in the first code snippet above:
temp = Tempfile.new('temp.csv', encoding: 'ascii-8bit')
But I'd prefer to upload and/or download the file from AWS S3 with the proper encoding. Can someone explain why specifying the encoding on the tempfile works? Or how to make it work through the AWS S3 upload/download?
Important to note: The problematic character in the error message appears to just be a random symbol added at the beginning of this auto-generated file I'm working with. I'm not worried about reading the character correctly, it gets ignored when I parse the file anyways.