0

I need to read a csv file from S3(using boto) in order to create pandas data-frame. Problem is file name is partial known to me. I can read a file (where partial name of file is known to me) from my system using glob and pd_read csv.

How this can be done using Boto ?

File name is 'CELLBH_testing_phase1_automated_1234xvy345.csv' and i just know CELLBH as known keyword. Rest string keeps on changing.

Code to read a file using boto where i know the exact file name:

access_key="xxxxxxxxxx"
secret_key="xxxxxxxxxx"

conn=boto.connect_s3(
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key,
    host='xxxxxxxxx',
    is_secure=False,
    calling_format=boto.s3.connection.OrdinaryCallingFormat(),
    )
bucket=conn.get_bucket('npousecase',validate=False)

Test_File='CELLBH.csv'
k=Key(bucket,Test_File)
content=k.get_contents_as_string()
Test=pd.read_csv(StringIO.StringIO(content),sep=";",header=0)

Code to read file 'CELLBH_testing_phase1_automated_1234xvy345.csv' if its on my system

data_dir="C:\\users\\adbharga\\Desktop\\Input"
os.chdir(data_dir)

## Reading files from Input Directory

for f in glob.glob('CELLBH*.csv'):
    Test = pd.read_csv(f,sep=";",header=0)

How i can do the above using Boto ? Hope Question is clear. Thanks

1 Answers1

0

Check this answer: How to read a csv file from an s3 bucket using Pandas in Python It seems that you can do a loop around the answers code to get what you want.

Like:

for bucket_name in glob.glob('CELLBH*.csv'):

     object_key = 'my_file.csv'
     csv_obj = client.get_object(Bucket=bucket_name, Key=object_key)
     body = csv_obj['Body']
     csv_string = body.read().decode('utf-8')
     df = pd.read_csv(StringIO(csv_string))
Jorge
  • 2,181
  • 1
  • 19
  • 30
  • in above Q, bucket name is fixed and known, while in the Object Key i.e file name only sub string is known rather than knowing the whole file name. – Aditya Bhargava Apr 28 '19 at 06:02