1

I am trying to implement the following flow in an Azure Data Factory pipeline:

  1. Copy files from an SFTP to a local folder.
  2. Create a comma separated file in the local folder with the list of files and their sizes.

The first step was easy enough, using a 'Copy Data' step with 'SFTP' as source and 'File System' as sink.

The files are being copied, but in the output of this step, I don't see any file information.

I also don't see an option to create a file using data from a previous step.

Maybe I'm using the wrong technology? One of the reasons I'm using Azure Data Factory, is because of the integration runtime, which allows us to have a single fixed IP to connect to the external SFTP. (easier firewall configuration)

Is there a way to implement step 2?

Thanks for any insight!

bluedot
  • 628
  • 8
  • 24

1 Answers1

1

There is no built-in feature to achieve this.

You need to use ADF with other service, I suppose you to first use azure function to check the files and then do copy.

The structure should be like this:

enter image description here

You can get the size of the files and save them to the csv file:

Get size of files(python):

How to fetch sizes of all SFTP files in a directory through Paramiko

And use pandas to save the messages as csv(python):

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

Writing a pandas DataFrame to CSV file

Simple http trigger of azure function(python):

https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook-trigger?tabs=python

(Put the processing logic in the body of the azure function. Basically, you can do anything you want in the body of the azure function except for the graphical interface and some unsupported things. You can choose the language you are familiar with, but in short, there is not a feature in ADF that satisfies your idea.)

Cindy Pau
  • 13,085
  • 1
  • 15
  • 27