I am trying to use awswrangler.s3.merge_datasets()
using a glob source string but it isn't working for me.
https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.s3.merge_datasets.html
import glob
import awswrangler as wr
wr.s3.merge_datasets(
source_path=glob.escape(f"s3://my-bucket/data/*/individual_file.parquet"),
target_path="s3://my-bucket/data/aggregated_files.parquet",
mode="append",
use_threads=True,
)
An empty list is returned.
Why doesn't this work? What am I doing wrong? Is there another way?
Thanks!
PS: In fact this doesn't even work for a single file - the globbing aside!
PPS: This answer https://stackoverflow.com/a/65816617/1021819 doesn't work for me.
PPPS: This might be the problem: https://stackoverflow.com/a/64261481/1021819 - but what then is the solution?