0

I want to bulk insert .csv files of sizes no greater than 50MB in an Azure SQL database in possibly no more than 10 minutes or so.

However, bulk insert is not available for the cloud. As pointed out in another thread, BCP is a viable solution but it requires to bulk insert the data in a local database. However, this is not an option in a PaaS infrastructure where there is no control at operative system level and it's not possible to create and manage a database.

At the moment, a .csv of 131KB requires more than 10 minutes to be uploaded raw by raw using a Standard S3: 100 DTU, 250 GB Azure SQL Database. How to speed up the upload of at least one order of magnitude?

import pandas as pd
import sqlalchemy
table = pd.Dataframe(<your_data_here>)
sql_driver = <driver>
sql_server = <server>
sql_user = <user>
sql_pass = <password>
sql_db = <database_name>
sql_conn = sqlalchemy.create_engine(
    'mssql+pyodbc://'+sql_user+':'+sql_pass+'@'+sql_server+'/'+sql_db+'?driver='+sql_driver)
table.to_sql('table_name', con=sql_conn)
MLguy
  • 1,776
  • 3
  • 15
  • 28

1 Answers1

5

You can bulk insert from BLOB storage account using below syntax:

BULK INSERT Test
FROM 'source/data/Test.csv'
WITH ( DATA_SOURCE = 'MyAzureBlobStorage',
              FORMAT='CSV', CODEPAGE = 65001, 
              FIRSTROW=2,
              TABLOCK);

For more information, please visit this documentation.

Alberto Morillo
  • 13,893
  • 2
  • 24
  • 30