0

I try to upload a file to GCS with the filename as the object name.

e.g. filename is "test", so the object is "test".

But there are some files's filenames not in pure ASCII characters, and I will got the exception like:

Traceback (most recent call last):
  File "test_oauth.py", line 303, in <module>
    file_upload(sys.argv[2], sys.argv[3])
  File "test_oauth.py", line 188, in file_upload
    bucket=bucket_name, name=object_name, media_body=media)
  File "./package/src/shared/python-lib/apiclient/discovery.py", line 640, in method
  File "./package/src/shared/python-lib/apiclient/model.py", line 137, in request
  File "./package/src/shared/python-lib/apiclient/model.py", line 171, in _build_query
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 3: ordinal not in range(128)

I think the object name will be accepted if the filename is UTF-8 strings by the post: https://cloud.google.com/storage/docs/bucketnaming

And I am sure the filename is in utf-8 encoding.

How should I deal with this situation?

I don't want to convert the filename into other encoding such as base64.

Here is the code snippet:

def file_upload(file_name=None, bucket_name=get_bucket_name()):

assert file_name and bucket_name

object_name = get_object_name(file_name)

print '%s: upload file: %s to bucket: %s to object: %s' % (
    sys._getframe().f_code.co_name, file_name, bucket_name, object_name)

media = MediaFileUpload(file_name, chunksize=CHUNKSIZE, resumable=True)

service = get_authenticated_service()

print "mimetype: %s" % media.mimetype()
if not media.mimetype():
    media = MediaFileUpload(FILE_UPLOAD,
                            mimetype=DEFAULT_MIMETYPE,
                            resumable=True)

print 'object_mame: %s' % object_name
request = service.objects().insert(
    bucket=bucket_name, name=object_name, media_body=media)

the object_name is equal to filename and I'm sure the system encoding is UTF-8.

Thanks!

Mike Chiu
  • 45
  • 7
  • Does `name=object_name.encode('utf8')` work? – jterrace Nov 26 '14 at 17:49
  • No, I try to do encode but still get the same Exception. Even I try to decode it first to let it back to the unicode object, it still gets the same error. – Mike Chiu Nov 27 '14 at 08:50
  • Currently I change the filename if there is multi-bytes character existed, and it works fine now. I just wonder why the filename with UTF-8 encoded works fail. – Mike Chiu Dec 22 '14 at 03:05

1 Answers1

0

As per this post you need to pass the name through urllib.quote() first and it should work

Community
  • 1
  • 1
Ryan
  • 2,512
  • 1
  • 13
  • 20
  • it works while uploading multibytes filename with quote in the format: %SDF.... but the file after uploading will shows the the quote name, too. cause I don't upload the metadata, does this any matters to this problem ? – Mike Chiu Jan 13 '15 at 02:24
  • Shouldn't. I tested on my end and the only issue I saw was Windows replaced the quote with a dash when I dled the file – Ryan Jan 13 '15 at 15:59
  • OK, I'll continue test this, I think that's because the lib I use does not support. I should trace the apiclient lib in detail. Thanks for ur answer! – Mike Chiu Jan 14 '15 at 01:56