0

I am trying to use awswrangler.s3.merge_datasets() using a glob source string but it isn't working for me.

https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.s3.merge_datasets.html

import glob
import awswrangler as wr
wr.s3.merge_datasets(
    source_path=glob.escape(f"s3://my-bucket/data/*/individual_file.parquet"),
    target_path="s3://my-bucket/data/aggregated_files.parquet",
    mode="append",
    use_threads=True,
)

An empty list is returned.

Why doesn't this work? What am I doing wrong? Is there another way?

Thanks!

PS: In fact this doesn't even work for a single file - the globbing aside!

PPS: This answer https://stackoverflow.com/a/65816617/1021819 doesn't work for me.

PPPS: This might be the problem: https://stackoverflow.com/a/64261481/1021819 - but what then is the solution?

jtlz2
  • 7,700
  • 9
  • 64
  • 114

0 Answers0