The code below reads a CSV file from AWS s3 using Pycham on my local machine.
# Read CSV from s3
import os
import boto3
import pandas as pd
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO # Python 2.x
else:
from io import StringIO
aws_id = 'XXXXXXXXXXXXXXX'
aws_secret = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
client = boto3.client('s3', aws_access_key_id=aws_id, aws_secret_access_key=aws_secret)
bucket_name = 'bucket-name'
object_key = 'folder-name/test.csv'
csv_obj = client.get_object(Bucket=bucket_name, Key=object_key)
body = csv_obj['Body']
csv_string = body.read().decode('utf-8')
df = pd.read_csv(StringIO(csv_string))
x = df.head()
print(x)
I would like to be able to read multiple CSV files in the same way. Pretty much anything that is in the folder.
My files are in the following directory:
bucket-name/folder-name/year=2018/month=01/file_032342.csv
bucket-name/folder-name/year=2018/month=02/file_434423.csv
bucket-name/folder-name/year=2018/month=03/file_343254.csv
bucket-name/folder-name/year=2018/month=04/file_544353.csv