14

Are there any API function that allow us to move files in Google Cloud Storage from one bucket in another bucket?

The scenario is we want Python to move read files in A bucket to B bucket. I knew that gsutil could do that but not sure Python can support that or not.

Thanks.

user3769827
  • 367
  • 2
  • 4
  • 16

5 Answers5

23

Here's a function I use when moving blobs between directories within the same bucket or to a different bucket.

from google.cloud import storage
import os
    
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="path_to_your_creds.json"

def mv_blob(bucket_name, blob_name, new_bucket_name, new_blob_name):
    """
    Function for moving files between directories or buckets. it will use GCP's copy 
    function then delete the blob from the old location.
    
    inputs
    -----
    bucket_name: name of bucket
    blob_name: str, name of file 
        ex. 'data/some_location/file_name'
    new_bucket_name: name of bucket (can be same as original if we're just moving around directories)
    new_blob_name: str, name of file in new directory in target bucket 
        ex. 'data/destination/file_name'
    """
    storage_client = storage.Client()
    source_bucket = storage_client.get_bucket(bucket_name)
    source_blob = source_bucket.blob(blob_name)
    destination_bucket = storage_client.get_bucket(new_bucket_name)

    # copy to new destination
    new_blob = source_bucket.copy_blob(
        source_blob, destination_bucket, new_blob_name)
    # delete in old destination
    source_blob.delete()
    
    print(f'File moved from {source_blob} to {new_blob_name}')
rpanai
  • 12,515
  • 2
  • 42
  • 64
dmlee8
  • 231
  • 2
  • 3
  • For folks who want to move ALL files within a given GCS bucket, here is a FOR loop to do just that! `blobs = storage_client.list_blobs(bucket_name) for blob in blobs: mv_blob(bucket_name, blob.name, new_bucket_name, blob.name)` – Bluebird Jul 28 '22 at 18:56
4

Using the google-api-python-client, there is an example on the storage.objects.copy page. After you copy, you can delete the source with storage.objects.delete.

destination_object_resource = {}
req = client.objects().copy(
        sourceBucket=bucket1,
        sourceObject=old_object,
        destinationBucket=bucket2,
        destinationObject=new_object,
        body=destination_object_resource)
resp = req.execute()
print json.dumps(resp, indent=2)

client.objects().delete(
        bucket=bucket1,
        object=old_object).execute()
jterrace
  • 64,866
  • 22
  • 157
  • 202
  • 2
    can you tell which package you are importing for above code? – Mahdi Jan 12 '19 at 13:50
  • @Mahdi: You import `from google.cloud import storage`. Documentation for the Python Cloud Client for GCP storage can be found here: https://googleapis.dev/python/storage/latest/index.html – Olsgaard Jul 27 '20 at 07:39
1

you can use GCS Client Library Functions documented at [1] to read to one bucket and write to the other and then delete source file.

You can even use the GCS REST API documented at [2].

Link:
[1] - https://developers.google.com/appengine/docs/python/googlecloudstorageclient/functions
[2] - https://developers.google.com/storage/docs/concepts-techniques#overview

Paolo Casciello
  • 7,982
  • 1
  • 43
  • 42
Paolo P.
  • 876
  • 5
  • 8
  • 1
    Note that the library documented for #1 is designed for use with App Engine. If you're not using App Engine, link #2 is your best bet. – Brandon Yarbrough Sep 22 '14 at 17:01
0
def GCP_BUCKET_A_TO_B():                                                                           
    source_bucket = storage_client.get_bucket("Bucket_A_Name")
    filename = [filename.name for filename in 
    list(source_bucket.list_blobs(prefix=""))]
    for i in range (0,len(filename)):
        source_blob = source_bucket.blob(filename[i])
        destination_bucket = storage_client.get_bucket("Bucket_B_Name")
        new_blob = source_bucket.copy_blob(
            source_blob, destination_bucket, filename[i])  
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 27 '22 at 14:14
0

I just wanted to point out that there's another possible approach and that is using gsutil through the use of the subprocess module.

The advantages of using gsutil like that:

  • You don't have to deal with individual blobs
  • gsutil's implementation of the move and especially rsync will probably be much better and more resilient that what we do ourselves.

The disadvantages:

  • You can't deal with individual blobs easily
  • It's hacky and generally a library is preferable to executing shell commands

Example:

def move(source_uri: str,
         destination_uri: str) -> None:
    """
    Move file from source_uri to destination_uri.

    :param source_uri: gs:// - like uri of the source file/directory
    :param destination_uri: gs:// - like uri of the destination file/directory
    :return: None
    """
    cmd = f"gsutil -m mv {source_uri} {destination_uri}"
    subprocess.run(cmd)
dom
  • 111
  • 5