1

Code below so far

t1 = S3ListOperator(
    task_id='list_s3_files',
    bucket='mybucket',
    prefix='v01/{{ds}}/',
    delimiter='/'
)

will then copy the latest file across using S3CopyObjectOperator

hc_dev
  • 8,389
  • 1
  • 26
  • 38
facepalmdev7
  • 63
  • 1
  • 4
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jan 27 '22 at 21:27

1 Answers1

0

Not a particular "Airflow way", but you could do this with a PythonOperator:

all_objects = boto3.resource('s3').bucket(your_bucket_name).objects.iterator()
sorted_objs = sorted(all_objects, key=lambda o: o.last_modified)
latest_file = sorted_objs[-1]

Though it's not an "industrial solution", as it requires pulling all the files just to sort them. S3 doesn't support "querying" like that.

If you have a predictable way to segment the files (e.g per-day, per-hour), it wouldn't be that bad though.

Kache
  • 15,647
  • 12
  • 51
  • 79