1

I want to load data from on-prem (Data Lake) storage to azure Data Lake storage gen2.

For this, I have created on-prem windows server and installed self hosted Integration Run-time on it.And connected to on-prem Data Lake(HIVE) from Azure Data Factory.

In Azure Data Factory I have created a pipeline with copy activity and provided source as my on-prem Data Lake(Hive).And given SQL query to pull data.Likewise I need to add multiple copy activities for multiple tables.

I have tried with single copy activity only in my pipeline.

Here comes my problem:My pipeline is taking so much of time to load data into Data Lake.

My windows server in which my Integration Run-time is located has Bandwidth of 10Gbps.But it still loads very slow.

I have just tried to pull 20,000 records.And it took around 20 minutes to load data. The Throughput i was getting is around 15kbps which is very low.

How can I improve the performance of my activity so that it will be faster.

venkat
  • 111
  • 1
  • 1
  • 11

1 Answers1

0

Can you check the configuration of Integration Runtime? How much RAM or nodes you have configured?

Also, are you using Express Route or Side by Side VPN, Express Route is a faster option

The recommended minimum configuration for the self-hosted integration runtime machine is a 2-GHz processor with 4 cores, 8 GB of RAM, and 80 GB of available hard drive space.

Pratik Somaiya
  • 695
  • 5
  • 18
  • We need an Integration Run-time to connect to on-prem right. So I have procured a windows server near to my data centre and created Integration Run-time on it. And connected to on-prem server through ADF. My windows server has Bandwidth of 10Gbps.32 core, 16gb RAM, 2.79 Ghz – venkat Feb 22 '22 at 12:08
  • ok, please try seeing this article for troubleshooting: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance-troubleshooting – Pratik Somaiya Feb 23 '22 at 04:37