1

I have 20 files of type Excel/pdf located in different https server. i need to validate these file and load into azure storage Using Data Factory.I need to do apply some business logic on this data and load into azure SQL Database.I need to if we have to create a pipe line and store this data in azure blob storage and then load into Azure sql Database

I have tried creating copy data in data factory

user3035090
  • 57
  • 10
  • You'll need to be a bit more specific, as the question currently is way too broad. What have you tried? Where did you fail? How did you try to fix it? – rickvdbosch Oct 14 '19 at 12:55
  • I was asking for a suggestion on how to approach this requirement.Should i load all this file into Azure Blob Storage.and then do some transformation and load into Azure SQL Database. – user3035090 Oct 14 '19 at 14:10
  • Yes ,sorry for late response.have posted reply for your comment – user3035090 Oct 22 '19 at 04:13

1 Answers1

-1

My idea as below:

No.1

Step 1: Use Copy Activity to transfer data from http connector source into blob storage connector sink.

Step 2: Meanwhile, configure a blob storage trigger to execute your logic code so that the blob data will be processed as soon as it's collected into blob storage.

Step 3: Use Copy Activity to transfer data from blob storage connector source into SQL database connector sink.

No.2:

Step 1:Use Copy Activity to transfer data from http connector source into SQL database connector sink.

Step 2: Meanwhile, you could configure stored procedure to add your logic steps. The data will be executed before inserted into table.

I think both methods are feasible. The No.1, the business logic is freer and more flexible. The No.2, it is more convenient, but it is limited by the syntax of stored procedures. You could pick the solution as you want.


The excel and pdf are supported yet. Based on the link,only below formats are supported by ADF diectly:

enter image description here

i tested for csv file and get the below random characters:

enter image description here

You could refer to this case to read excel files in ADF:How to read files with .xlsx and .xls extension in Azure data factory?

Jay Gong
  • 23,163
  • 2
  • 27
  • 32
  • Thanks for the input Jay.Do we need to use any web jobs in order to process these files.There are few data which are embedded inside the website .i believe i need to use some VB code or ,Power shell script to Bring this data into Azure tables. Does the files save as it is in the Azure blob storage – user3035090 Oct 22 '19 at 04:12
  • @user3035090 Never mind for the late. Yes,of course you could use web jobs or azure function to process these files into blob storage or sql database. But the format of binary files of both of them are different. In blob storage,it is direct reference url, you could access if from your app or website. In sql db,it is stored as binary metadata,if you want to display it, you have to write the metadata into file so that you could view it. In my opinion, may blob storage is the better solution. – Jay Gong Oct 22 '19 at 08:04
  • :I had a question.Is the files of type Excel,pdf etc are not supported in Azure Copy Data Datafactory? – user3035090 Oct 25 '19 at 05:06
  • Thanks Jay Gong for the input.I have created a python script which process the excel file from a public site and convert the file into csv.I have converted this python script to .exe using auto py and have placed the .exe inside the azure blob stoarage container.I have created a batch account and a linked service for the same .I used the custom activity inside batch service to execute the .exe file inside the pipeline.But the process is not running.Is there anything else to be taken care as part of the process – user3035090 Nov 12 '19 at 07:11