I am pretty new using Azure SQL database. I have been given a task to push a 100 million record text file to Azure SQL database. I'm looking for suggestions how to do it in an efficient manner.
Asked
Active
Viewed 1,818 times
0
-
define efficient.. – Peter Bons Dec 03 '17 at 20:29
-
i am worried about the time it takes and to perform lookup queries after loading the data into table – Midhun Mathew Sunny Dec 03 '17 at 20:31
-
Is it a recurrent task? what about azure data factory? – Peter Bons Dec 03 '17 at 20:32
-
Yep, it will be a recurrent task. I havent checked azure data factory. I am looking for some sort of database like sql server where i need to push this data and then query it from there. – Midhun Mathew Sunny Dec 03 '17 at 20:34
-
You should, it is designed for tasks like this. [Supported file formats](https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-supported-file-and-compression-formats#compression-support), Where is the file located, on-premises or in the cloud? – Peter Bons Dec 03 '17 at 20:39
-
It's a column delimited text file located on-prem – Midhun Mathew Sunny Dec 03 '17 at 20:42
-
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/160384/discussion-between-midhun-mathew-sunny-and-peter-bons). – Midhun Mathew Sunny Dec 03 '17 at 21:01
2 Answers
3
You have several options to upload on-premise data to your SQL Azure database
- SSIS - As Randy mentioned you can create an SSIS package (using SSMS) and schedule an SQL Agent job to run this package periodically.
- Azure Data Factory - You can define an ADF pipeline that periodically uploads data from your on-premise file to your SQL Azure database. Depending on your requirements you might need just the initial 'Connect and collect' part of the pipeline or you might want to add further additional processing in the pipeline
- bcp - The 'bulk copy program' utility can be used to copy data between SqlServer and a data file.Similar to the SSIS package you can use an SQL Agent job to schedule periodic uploads using bcp.
- SqlBulkCopy - I doubt if you would need this, but in case you need to integrate this into your application programmatically this class helps you achieve the same as the bcp utility (bcp is faster) via .NET code.

JonathanDavidArndt
- 2,518
- 13
- 37
- 49

alwayslearning
- 4,493
- 6
- 35
- 47
1
I would do this via SSIS using SQL Studio Managemenet Studio (if it's a one time operation). If you plan to do this repeatedly, you could schedule the SSIS job to execute on schedule. SSIS will do bulk inserts using small batches so you shouldn't have transaction log issues and it should be efficient (because of bulk inserting). Before you do this insert though, you will probably want to consider your performance tier so you don't get major throttling by Azure and possible timeouts.

Randy Minder
- 47,200
- 49
- 204
- 358
-
Thanks. Do you have an estimate of the time it will take for the transfer – Midhun Mathew Sunny Dec 03 '17 at 21:15
-
This totally depends on your performance tier in Azure, the width (number of bytes) of each row and the number of indexes on the table you are importing into. The wider your row, the longer it will take to import each row. The width of each row can have a drastic impact on import speed. – Randy Minder Dec 03 '17 at 21:20
-
Cool. And one more thing Randy. Once the data is loaded into the table, do you have any suggestions to optimize the table for lookup queries ( say for example, if the values in a column are repeated many times) – Midhun Mathew Sunny Dec 03 '17 at 21:23
-
1That is an entirely different topic and probably worthy of a completely different question. – Randy Minder Dec 03 '17 at 21:25
-
Please answer to this question @Randy. https://stackoverflow.com/questions/47623756/creating-optimized-azure-sql-table-for-querying – Midhun Mathew Sunny Dec 03 '17 at 22:00