Read files from s3 bucket that match a pattern in python

Question

I am reading a file from s3 in pandas.

aws_credentials = { 
                    "key": "xxxx", 
                    "secret": "xxxx" 
                  }

# Read data from S3 
df_aln = pd.read_csv("s3://dir/ABC/fname_0521.csv", storage_options=aws_credentials, encoding='latin-1')

However, I have several files with same shape and similar naming convention fname_mmyy. How do I read all the files that match the naming pattern and combine them into one pandas DataFrame?

I'd prefer to not write pd.read_csv to read each file separately.

Not sure this works with S3, but see https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe — jarmod, Nov 15 '22 at 23:12

score 1 · Answer 1 · answered Nov 15 '22 at 23:34

According to this answer: https://stackoverflow.com/a/69568591/687896 , you can use glob on S3. Your pattern would be something like fname_*.csv:

# get the list of CSV files (from cited answer):
import s3fs
s3 = s3fs.S3FileSystem(anon=False)
csvs = s3.glob('your/s3/path/to/fname*.csv')

# read them into pandas + concat the dfs
dfs = []
for csv in csvs:
    df = pandas.read_csv(csv)
    dfs.append(df)

df = pandas.concat(dfs)

That (or something along those lines) should work.

Read files from s3 bucket that match a pattern in python

1 Answers1