3

I'm still new to Azure Data Factory and am trying to move files that are dumped in my S3 folder/bucket daily to Azure blob. I already created datasets (for source and sink) and linked services in Data Factory.

But since my S3 bucket receives new file every day, I'm wondering how to move the latest file that was dropped in the S3 (say at 5am EST) on a daily basis. I have looked through most of the answers online like this, this, this and this. But none of them explains how to figure out which is the latest file in S3 (maybe based on last modified date/time or by matching the file name pattern that goes like this 'my_report_YYYYMMDD.csv.gz') and only copy that file to the destination blob.

Thank you in advance for your help/answer!

user1330974
  • 2,500
  • 5
  • 32
  • 60

2 Answers2

1

My idea as below:

1.Firstly,surely,configure your pipeline execution in the schedule trigger.Refer to this link.

2.Use Get metadata activity ,which supports Amazon S3 Connector,to get the files in your S3 dataset.

enter image description here

Get the last modified and file name etc. metadata.

enter image description here

3.Put these metadata array which contains lastModified Time and file name into a Web Activity or Azure Function Activity. In that rest api or function method,you could do a sort logical business to get the latest modified file.

4.Get the fileName from Web Activity or Azure Function Activity ,then copy it into Azure Blob Storage.

Another idea is using Custom-Activity.You could implement your requirements with .net code.

Jay Gong
  • 23,163
  • 2
  • 27
  • 32
1

(Side note: thanks to Jay Gong above for suggesting a solution)

I found the answer. It's simpler than I expected. There's dynamic content/expression that we can add to 'Filter by last modified' field of the S3 dataset. Please see the screenshot below where I show how I picked files that are no more than 5 hours old by using dynamic expression. More about these expressions can be read here.

enter image description here

Hope this is helpful.

Community
  • 1
  • 1
user1330974
  • 2,500
  • 5
  • 32
  • 60