0

I'm using Apache Libcloud to upload files to a Google Cloud Storage bucket together with object metadata.

In the process, the keys in my metadata dict are being lowercased. I'm not sure whether this is due to Cloud Storage or whether this happens in Libcloud.

The issue can be reproduced following the example from the Libcloud docs:

from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver

cls = get_driver(Provider.GOOGLE_STORAGE)
driver = cls('SA-EMAIL', './SA.json') # provide service account credentials here


FILE_PATH = '/home/user/file'

extra = {'meta_data': {'camelCase': 'foo'}}

# Upload with metadata
with open(FILE_PATH, 'rb') as iterator:
    obj = driver.upload_object_via_stream(iterator=iterator,
                                          container=container,
                                          object_name='file',
                                          extra=extra)

The file uploads succesfully, but resulting metadata will look as follows: result

Where camelCase has been turned into camelcase.

I don't think GCS disallows camelcase for object metadata, since it's possible to edit the metadata manually in that sense: enter image description here

I went through Libcloud's source code, but I don't see any explicit lowercasing going on. Any pointers on how to upload camelcased metadata with libcloud are most welcome.

sdcbr
  • 7,021
  • 3
  • 27
  • 44

1 Answers1

2

I also checked the library and wasn't able to see anything obvious. But I guess to open a new issue there will be a great start.

As far as what's concerned on the Google Cloud Storage side, and as you could verify by yourself it does admit camelcase. I was able to successfully edit the metadata of a file by using the code offered on their public docs (but wasn't able to figure out something on libcloud itself):

from google.cloud import storage


def set_blob_metadata(bucket_name, blob_name):
    """Set a blob's metadata."""
    # bucket_name = 'your-bucket-name'
    # blob_name = 'your-object-name'

    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.get_blob(blob_name)
    metadata = {'camelCase': 'foo', 'NaMe': 'TeSt'}
    blob.metadata = metadata
    blob.patch()

    print("The metadata for the blob {} is {}".format(blob.name, blob.metadata))

So, I believe that this could be a good workaround on your case if you are not able to work it out with libcloud. Do notice that the Cloud Storage Client Libraries base their authentication on environment variables and the following docs should be followed.

Addition by question author: As hinted at in the comments, metadata can be added to a blob before uploading a file as follows:

from google.cloud import storage
gcs = storage.Client()
bucket = gcs.get_bucket('my-bucket')
blob = bucket.blob('document')
blob.metadata = {'camelCase': 'foobar'}
blob.upload_from_file(open('/path/to/document', 'rb'))

This allows to set metadata without having to patch an existing blob, and provides an effective workaround for the issue with libcloud.

sdcbr
  • 7,021
  • 3
  • 27
  • 44
Daniel Ocando
  • 3,554
  • 2
  • 11
  • 19
  • Thanks! I will raise an issue on Libcloud's Github. The reason why I'm using Libcloud, is that I'm also relying on Pub/Sub notifications for Cloud Storage, and I need the metadata to be included in those notifications. If I understand it correctly, the GCS client library only allows to patch metadata, i.e., on an existing object blob. In that way, the P/S notification won't include the metadata. – sdcbr Dec 17 '20 at 19:23
  • 1
    This was a [known issue back in the day](https://stackoverflow.com/questions/33145047/set-metadata-in-google-cloud-storage-using-gcloud-python) and the patch() method needed to be called. Nonetheless, as per [this comment](https://github.com/googleapis/google-cloud-python/issues/1185#issuecomment-551203536) since 2019 all properties assigned to the blob before upload (including metadata) are passed through. I haven't tested this though. Notice that other libraries such as [NodeJS](https://cloud.google.com/storage/docs/uploading-objects#node.js) do include a metadata field before upload. – Daniel Ocando Dec 17 '20 at 20:06
  • Thanks! I can confirm that it works with the GCS python client library by creating a blob, setting the metadata, and only then uploading the file. I filed an issue with libcloud, but I will just move back to using the GCS client library. – sdcbr Dec 18 '20 at 09:21
  • Awesome! I'm very glad that the workaround suggested worked! – Daniel Ocando Dec 18 '20 at 09:43
  • I got a fast response from one of libcloud's authors: https://github.com/apache/libcloud/issues/1530 Apparantly this is a limitation in the S3 API. – sdcbr Dec 20 '20 at 07:22