1

My question is relevant to the previous one copy files from one AWS/S3 bucket to another bucket on databricks. I created a new thread because this question is different from the previous one.

This post AWS S3 copy files and folders between two buckets does not help me.

I need to copy some files from one AWS/S3 bucket/folder to another AWS/S3 bucket folder by python on databricks.

My source S3 bucket/folder is like :

   source_s3_bucket
      folder_name1
        folder_name2
           folder_name3
             folder_name4
                 deepest_folder_name
                      file1
                      file2
                       ....
                      file11500

The destination s3 bucket/folder:

   destination_s3_bucket
      dest_folder_name1
        dest_folder_name2
           dest_folder_name3
             deepest_folder_name (this folder name must be exactly same as the source one "deepest_folder_name")
                      file1
                      file2
                       ....
                      file11500

Also, the "dest_folder_nameX" are all different from the sources ones and also the depth of the source and destination folders are also different. But, the deepest folder name in source bucket must be kept in destination bucket.

All files must be exactly copied and keep the same names.

I have tried to do the python3 coding:

import boto3
s3 = boto3.client('s3')
s3_resource = boto3.resource('s3')
for key in s3.list_objects(Bucket=source_bucket, Prefix=source_prefix)['Contents']:
    files = key['Key']
    copy_source = {'Bucket': source_bucket,'Key': files}
    s3_resource.meta.client.copy(CopySource=copy_source, Bucket=dest_bucket, Key=dest_prefix)

But, no files are copied to the destination folder and also how I can keep the "deepest_folder_name" ?

UPDATE "The deepest folder" means that I have to keep that layer's sub-folders' names and copy them and the files located in them to the destination.

for example, in source bucket:

  folder_name_abc
     folder_name_dfr
        folder_name_typ # this folder names must be kept
            file1
            file2

  In destination bucket:
       folder_name_typ # this folder names must be exactly same as the source
           file1
           file2

thanks

user3448011
  • 1,469
  • 1
  • 17
  • 39
  • The Amazon S3 copy command will only copy one object. When specifying a destination, provide the full Key for the destination object (not just a directory). Also, you seem to have mixed-up resource and client in `s3_resource.meta.client.copy` — the `copy()` method works on a resource, but you are providing parameters that look like they come from `copy_object()`, which is a client call. I don't understand what you are requiring with the "deepest folder" thing. – John Rotenstein Feb 20 '20 at 03:49
  • "The deepest folder" means that I have to keep the sub-folders' names and copy them and the files located in them to the destination. Please see the UPDATE in OP. thanks – user3448011 Feb 20 '20 at 04:23

1 Answers1

4

The tricky part is manipulating the 'path' portion of the object Keys.

You could use something like this:

import boto3

s3_client = boto3.client('s3')

SOURCE_BUCKET = 'bucket1'
SOURCE_PREFIX = 'folder_name_abc/folder_name_dfr/' # Where is Folder located? (Leave blank if root level, include slash at end if Prefix specified)
FOLDER_TO_COPY = 'folder_name_typ'

DESTINATION_BUCKET = 'bucket2'
DESTINATION_PREFIX = '' # (Leave blank if root level, include slash at end if Prefix specified)

# List objects in source directory
bucket_listing = s3_client.list_objects_v2(Bucket=SOURCE_BUCKET,Prefix=f'{SOURCE_PREFIX}{FOLDER_TO_COPY}/')

for object in bucket_listing['Contents']:
    print(f'Copying from {object['Key']} to {DESTINATION_PREFIX + object['Key'][len(SOURCE_PREFIX):]}')
    s3_client.copy_object(
        CopySource = {'Bucket': SOURCE_BUCKET, 'Key': object['Key']},
        Bucket = DESTINATION_BUCKET,
        Key = DESTINATION_PREFIX + object['Key'][len(SOURCE_PREFIX):] # Remove source prefix, add destination prefix
        )
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470