0

I am trying to setup an ETL pipeline where

  1. Source is a SQL Server table's column in binary stream form
  2. Destination (sink) is s3 bucket

My requirements are:

  1. To read binary stream column from SQL Server table
  2. Process the binary stream data row by row
  3. Upload file to a S3 bucket for each binary stream data

I have tried DataFlow, Copy, AWS Connectors on Azure Data Factory, but there is no option to set s3 bucket as destination (sink)

Is there any other approach available in Azure Data Factory to match these requirements?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
code_hr
  • 11
  • 2

1 Answers1

0

One workaround is to use SFTP as sink using copy data activity. Below is something I have tried.

Here is the pipeline where I have used Lookup to get the SQL Table rows.

enter image description here

In my foreach loop I have set the filename and file data into 2 variables.

enter image description here


enter image description here

Take a sample csv file (1 row, 1 column which will be ignore anyway) and add an additional column with dynamic content value as processed binary stream data (keep only the required additional column and delete the rest in mapping).

enter image description here


enter image description here

Create a sink dataset to destination blob container where you'll store each row data as separate blob.

enter image description here

Now you can add another copy activity making the sink of copy activity 1 (previous copy activity) as source and then make sink as SFTP which allows you to connect to amazon s3.

enter image description here


enter image description here

For more information, you can refer Copy and transform data in SFTP server using Azure Data Factory

SwethaKandikonda
  • 7,513
  • 2
  • 4
  • 18