1

I have a excel sheet with metadata with 3 fields (path,folder_structure,filename)

Path: it is the path of source file in s3 source bucket folder_structure: new folder structure that need to be created in Target bucket filename: this is the filename that need to be renamed after copying to target bucket

I have below code working in windows source folder and creating target folder and copying data to target folder. need to modify this to source from s3 bucket and load it another s3 bucket.

code:

import pandas as pd import os,shutil from pathlib import Path

data = pd.read_excel('c:\data\sample_requirement.xlsx',engine='openpyxl')

root_dir = 'source'

for rec in range(len(data)):

#Replacing the '|' symbol with backward slash
dire = data['folder_structure'][rec].replace('|','\\')

#appending root directory with folder structure
directory = root_dir+'\\'+dire
#print(directory)

#Checking if path exists, if exit-> skip else-> create new
if not os.path.exists(directory):
    #print('Not exist')
    
    #creating new directory
    os.makedirs(directory)

#Path in the excel
path = data['path'][rec]

#Filenames to change
filename = data['filename'][rec]
#print(filename)

if not os.path.isfile(directory + filename) :

    #Copying the files to created path
    shutil.copy(path,directory)

    #Renaming the files
    try: 
        os.rename(directory + os.path.basename(path),directory + filename)
    except FileExistsError as e:
        print('File Name already Exists')
roy
  • 11
  • 2

2 Answers2

1

How about this just add this to your code replace your target and destination bucket:-

import boto3
s3 = boto3.resource('s3')
copy_source = {
    'Bucket': 'yoursourcebucket',
    'Key': 'yourkey'
}
s3.meta.client.copy(copy_source, 'nameofdestinationbucket', 'destinationkey')

its a good practice to follow docs for knowing the details of the code, also note there maybe many other ways too to perform the same operation for example using awscli https://stackoverflow.com/a/32526487/13126651

Marcin
  • 215,873
  • 14
  • 235
  • 294
Jatin Mehrotra
  • 9,286
  • 4
  • 28
  • 67
0

Copy one file from a bucket to another :

s3 = boto3.client("s3")
s3.copy({"Bucket": SOURCE_BUCKET, "Key": SOURCE_KEY}, DESTINATION_BUCKET, DESTINATION_KEY)

Copy up to 1000 files a bucket to another :

s3 = boto3.client("s3")
    response = s3.list_objects_v2(
        Bucket=self.bucket,
        Prefix=self.path,
    ) # Warning not handling pagination -> will truncate after 1000 keys
    for file in response['Contents']:
        s3.copy(
            {"Bucket": SOURCE_BUCKET, "Key": file['Key']},
            DESTINATION_BUCKET,
            "/".join(
                DESTINATION_PREFIX,
                file['KEY']
            ),
        )

Copy more than 1000 files :

properly handle pagination when call list_objects_v2
loop until response['IsTruncated']
use response['NextContinuationToken'] as the new 'ContinuationToken' arg
Hugo
  • 1,195
  • 2
  • 12
  • 36