85

I have to insert about 2 million rows from a text file.

And with inserting I have to create some master tables.

What is the best and fast way to insert such a large set of data into SQL Server?

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
Wadhawan Vishal
  • 898
  • 1
  • 10
  • 16

8 Answers8

76
  1. I think its better you read data of text file in DataSet

  2. Try out SqlBulkCopy - Bulk Insert into SQL from C# App

    // connect to SQL
    using (SqlConnection connection = new SqlConnection(connString))
    {
        // make sure to enable triggers
        // more on triggers in next post
        SqlBulkCopy bulkCopy = new SqlBulkCopy(
            connection, 
            SqlBulkCopyOptions.TableLock | 
            SqlBulkCopyOptions.FireTriggers | 
            SqlBulkCopyOptions.UseInternalTransaction,
            null
            );
    
        // set the destination table name
        bulkCopy.DestinationTableName = this.tableName;
        connection.Open();
    
        // write the data in the "dataTable"
        bulkCopy.WriteToServer(dataTable);
        connection.Close();
    }
    // reset
    this.dataTable.Clear();
    

or

after doing step 1 at the top

  1. Create XML from DataSet
  2. Pass XML to database and do bulk insert

you can check this article for detail : Bulk Insertion of Data Using C# DataTable and SQL server OpenXML function

But its not tested with 2 million record, it will do but consume memory on machine as you have to load 2 million record and insert it.

Alexander
  • 9,104
  • 1
  • 17
  • 41
Pranay Rana
  • 175,020
  • 35
  • 237
  • 263
  • 1
    I know this is quite late, but for about 2 million rows (or more), if there are enough columns (25+), it is almost inevitable that the code will generate `OutOfMemoryException` at some point, when filling the dataset/datatable. – Razort4x Apr 24 '15 at 09:29
  • 19
    You can setup a buffer to avoid out of memory exceptions. For a text file I used File.ReadLines(file).Skip(X).Take(100000).ToList(). After every 100k I reset and move through the next 100k. Works good and very quick. – Jason Foglia Nov 13 '15 at 17:10
  • What if the source data is from a Sql Server table. Lets say the table has 30 million rows, can we still use bulkcopy? Wouldn't a simple `Insert into table1 Select * from table2` be faster? – Madhav Shenoy Jul 27 '19 at 00:37
63

You can try with SqlBulkCopy class.

Lets you efficiently bulk load a SQL Server table with data from another source.

There is a cool blog post about how you can use it.

Nasreddine
  • 36,610
  • 17
  • 75
  • 94
Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
  • 1
    Please also note that (as mentioned in the [SQLBulCopy](https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy?redirectedfrom=MSDN&view=netframework-4.7.2) link, as well) that `If the source and destination tables are in the same SQL Server instance, it is easier and faster to use a Transact-SQL INSERT … SELECT statement to copy the data.` – nam Apr 04 '19 at 20:17
32

Re the solution for SqlBulkCopy:

I used the StreamReader to convert and process the text file. The result was a list of my object.

I created a class than takes Datatable or a List<T> and a Buffer size (CommitBatchSize). It will convert the list to a data table using an extension (in the second class).

It works very fast. On my PC, I am able to insert more than 10 million complicated records in less than 10 seconds.

Here is the class:

using System;
using System.Collections;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Data.SqlClient;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace DAL
{

public class BulkUploadToSql<T>
{
    public IList<T> InternalStore { get; set; }
    public string TableName { get; set; }
    public int CommitBatchSize { get; set; }=1000;
    public string ConnectionString { get; set; }

    public void Commit()
    {
        if (InternalStore.Count>0)
        {
            DataTable dt;
            int numberOfPages = (InternalStore.Count / CommitBatchSize)  + (InternalStore.Count % CommitBatchSize == 0 ? 0 : 1);
            for (int pageIndex = 0; pageIndex < numberOfPages; pageIndex++)
                {
                    dt= InternalStore.Skip(pageIndex * CommitBatchSize).Take(CommitBatchSize).ToDataTable();
                BulkInsert(dt);
                }
        } 
    }

    public void BulkInsert(DataTable dt)
    {
        using (SqlConnection connection = new SqlConnection(ConnectionString))
        {
            // make sure to enable triggers
            // more on triggers in next post
            SqlBulkCopy bulkCopy =
                new SqlBulkCopy
                (
                connection,
                SqlBulkCopyOptions.TableLock |
                SqlBulkCopyOptions.FireTriggers |
                SqlBulkCopyOptions.UseInternalTransaction,
                null
                );

            // set the destination table name
            bulkCopy.DestinationTableName = TableName;
            connection.Open();

            // write the data in the "dataTable"
            bulkCopy.WriteToServer(dt);
            connection.Close();
        }
        // reset
        //this.dataTable.Clear();
    }

}

public static class BulkUploadToSqlHelper
{
    public static DataTable ToDataTable<T>(this IEnumerable<T> data)
    {
        PropertyDescriptorCollection properties =
            TypeDescriptor.GetProperties(typeof(T));
        DataTable table = new DataTable();
        foreach (PropertyDescriptor prop in properties)
            table.Columns.Add(prop.Name, Nullable.GetUnderlyingType(prop.PropertyType) ?? prop.PropertyType);
        foreach (T item in data)
        {
            DataRow row = table.NewRow();
            foreach (PropertyDescriptor prop in properties)
                row[prop.Name] = prop.GetValue(item) ?? DBNull.Value;
            table.Rows.Add(row);
        }
        return table;
    }
}

}

Here is an example when I want to insert a List of my custom object List<PuckDetection> (ListDetections):

var objBulk = new BulkUploadToSql<PuckDetection>()
{
        InternalStore = ListDetections,
        TableName= "PuckDetections",
        CommitBatchSize=1000,
        ConnectionString="ENTER YOU CONNECTION STRING"
};
objBulk.Commit();

The BulkInsert class can be modified to add column mapping if required. Example you have an Identity key as first column.(this assuming that the column names in the datatable are the same as the database)

//ADD COLUMN MAPPING
foreach (DataColumn col in dt.Columns)
{
        bulkCopy.ColumnMappings.Add(col.ColumnName, col.ColumnName);
}
Acanocois
  • 33
  • 6
Amir
  • 1,722
  • 22
  • 20
  • When I try to use your code I get error: IEnumerable does not contain definition for ToDataTable – eug100 May 17 '22 at 09:00
6

I use the bcp utility. (Bulk Copy Program) I load about 1.5 million text records each month. Each text record is 800 characters wide. On my server, it takes about 30 seconds to add the 1.5 million text records into a SQL Server table.

The instructions for bcp are at http://msdn.microsoft.com/en-us/library/ms162802.aspx

Bill Edmett
  • 61
  • 1
  • 1
4

I tried with this method and it significantly reduced my database insert execution time.

List<string> toinsert = new List<string>();
StringBuilder insertCmd = new StringBuilder("INSERT INTO tabblename (col1, col2, col3) VALUES ");

foreach (var row in rows)
{
      // the point here is to keep values quoted and avoid SQL injection
      var first = row.First.Replace("'", "''")
      var second = row.Second.Replace("'", "''")
      var third = row.Third.Replace("'", "''")

      toinsert.Add(string.Format("( '{0}', '{1}', '{2}' )", first, second, third));
}
if (toinsert.Count != 0)
{
      insertCmd.Append(string.Join(",", toinsert));
      insertCmd.Append(";");
}
using (MySqlCommand myCmd = new MySqlCommand(insertCmd.ToString(), SQLconnectionObject))
{
      myCmd.CommandType = CommandType.Text;
      myCmd.ExecuteNonQuery();
}

*Create SQL connection object and replace it where I have written SQLconnectionObject.

mrak
  • 3
  • 3
Amey Vartak
  • 309
  • 2
  • 5
  • 4
    Be careful! This can be exploited with SQL injection – Andre Soares Jul 17 '19 at 19:24
  • downvoted for 1) not avoiding sql injection attacks. (see https://stackoverflow.com/questions/15537368/how-can-sanitation-that-escapes-single-quotes-be-defeated-by-sql-injection-in-sq ) – Andrew Hill Jan 27 '21 at 12:57
3

I ran into this scenario recently (well over 7 million rows) and eneded up using sqlcmd via powershell (after parsing raw data into SQL insert statements) in segments of 5,000 at a time (SQL can't handle 7 million lines in one lump job or even 500,000 lines for that matter unless its broken down into smaller 5K pieces. You can then run each 5K script one after the other.) as I needed to leverage the new sequence command in SQL Server 2012 Enterprise. I couldn't find a programatic way to insert seven million rows of data quickly and efficiently with said sequence command.

Secondly, one of the things to look out for when inserting a million rows or more of data in one sitting is the CPU and memory consumption (mostly memory) during the insert process. SQL will eat up memory/CPU with a job of this magnitude without releasing said processes. Needless to say if you don't have enough processing power or memory on your server you can crash it pretty easily in a short time (which I found out the hard way). If you get to the point to where your memory consumption is over 70-75% just reboot the server and the processes will be released back to normal.

I had to run a bunch of trial and error tests to see what the limits for my server was (given the limited CPU/Memory resources to work with) before I could actually have a final execution plan. I would suggest you do the same in a test environment before rolling this out into production.

Techie Joe
  • 847
  • 2
  • 14
  • 32
  • 1
    how long did 7M rows take ? I have about 30M rows to insert. right now i'm pushing them through a stored procedure and a DataTable. – Alexandre Brisebois Jan 11 '13 at 18:35
  • 1
    It took a good five to six hours running the small batches concurrently. Keep in mind that I just did straight T-SQL insert commands as I leveraged the new SEQUENCE command in SQL 2012 and couldn't find information on how to automate this process outside of T-SQL. – Techie Joe Jan 14 '13 at 18:06
2

I faced an issue of a solution wich should work with ADO, Entity and Dapper, so a made this lib; it generates batches in form of:

    IEnumerable<(string SqlQuery, IEnumerable<SqlParameter> SqlParameters)>  
    IEnumerable<(string SqlQuery, DynamicParameters DapperDynamicParameters)> 

this link contains instructions. It's safe against SQL Injection, because the usage of parameters instead concatenation; you can set identity insert to ON too, if needed, by an optional parameter.

Usage with ADO.NET:

using MsSqlHelpers;
// ...
var mapper = new MapperBuilder<Person>()
    .SetTableName("People")
    .AddMapping(person => person.FirstName, columnName: "Name")
    .AddMapping(person => person.LastName, columnName: "Surename")
    .AddMapping(person => person.DateOfBirth, columnName: "Birthday")
    .Build();
var people = new List<Person>()
{ 
    new Person()
    {
        FirstName = "John", 
        LastName = "Lennon", 
        DateOfBirth = new DateTime(1940, 10, 9) 
    },
    new Person()
    {
        FirstName = "Paul", 
        LastName = "McCartney", 
        DateOfBirth = new DateTime(1942, 6, 18) 
    },
};
var connectionString = "Server=SERVER_ADDRESS;Database=DATABASE_NAME;User Id=USERNAME;Password=PASSWORD;";
var sqlQueriesAndParameters = new MsSqlQueryGenerator()
    .GenerateParametrizedBulkInserts(mapper, people);

using (var sqlConnection = new SqlConnection(connectionString))
{
    sqlConnection.Open();
    
    // Default batch size: 1000 rows or (2100-1) parameters per insert.
    foreach (var (SqlQuery, SqlParameters) in sqlQueriesAndParameters)
    {
        using (SqlCommand sqlCommand = new SqlCommand(SqlQuery, sqlConnection))
        {
            sqlCommand.Parameters.AddRange(SqlParameters.ToArray());
            sqlCommand.ExecuteNonQuery();
        }
    }
}

Usage with Dapper:

using MsSqlHelpers;
// ...
var mapper = new MapperBuilder<Person>()
    .SetTableName("People")
    .AddMapping(person => person.FirstName, columnName: "Name")
    .AddMapping(person => person.LastName, columnName: "Surename")
    .AddMapping(person => person.DateOfBirth, columnName: "Birthday")
    .Build();
var people = new List<Person>()
{ 
    new Person()
    {
        FirstName = "John", 
        LastName = "Lennon", 
        DateOfBirth = new DateTime(1940, 10, 9) 
    },
    new Person()
    { 
        FirstName = "Paul", 
        LastName = "McCartney", 
        DateOfBirth = new DateTime(1942, 6, 18) 
    },
};
var connectionString = "Server=SERVER_ADDRESS;Database=DATABASE_NAME;User Id=USERNAME;Password=PASSWORD;";
var sqlQueriesAndDapperParameters = new MsSqlQueryGenerator()
    .GenerateDapperParametrizedBulkInserts(mapper, people);

using (var sqlConnection = new SqlConnection(connectionString))
{
    // Default batch size: 1000 rows or (2100-1) parameters per insert.
    foreach (var (SqlQuery, DapperDynamicParameters) in sqlQueriesAndDapperParameters)
    {
        sqlConnection.Execute(SqlQuery, DapperDynamicParameters);
    }
}

Usage with Entity Framework:

using MsSqlHelpers;
// ...
var mapper = new MapperBuilder<Person>()
    .SetTableName("People")
    .AddMapping(person => person.FirstName, columnName: "Name")
    .AddMapping(person => person.LastName, columnName: "Surename")
    .AddMapping(person => person.DateOfBirth, columnName: "Birthday")
    .Build();
var people = new List<Person>()
{ 
    new Person() 
    { 
        FirstName = "John", 
        LastName = "Lennon", 
        DateOfBirth = new DateTime(1940, 10, 9) 
    },
    new Person()
    { 
        FirstName = "Paul", 
        LastName = "McCartney", 
        DateOfBirth = new DateTime(1942, 6, 18) 
    },
};
var sqlQueriesAndParameters = new MsSqlQueryGenerator()
    .GenerateParametrizedBulkInserts(mapper, people);

// Default batch size: 1000 rows or (2100-1) parameters per insert.
foreach (var (SqlQuery, SqlParameters) in sqlQueriesAndParameters)
{
    _context.Database.ExecuteSqlRaw(SqlQuery, SqlParameters);
    // Depracated but still works: _context.Database.ExecuteSqlCommand(SqlQuery, SqlParameters);
}
0

Another way is if the text data is in a Json format than you can use OPENJSON method in SQL Server for bulk insert. I tested in my local machine and was able to insert One million records in 51 Secs.

This is the stored Procedure:

CREATE PROCEDURE sp_upsert_play_user_details1  
(  
@array VARCHAR(MAX)  
)  
AS  
BEGIN  
BEGIN TRY  
BEGIN TRANSACTION  
  
 INSERT INTO tbl_play_user_details    
 (vc_first_name, vc_last_name, vc_full_name, vc_gender, vc_phone_number, vc_email, vc_pet, vc_vehicle_model, vc_vehicle_no, int_created_on, int_created_by)    

 SELECT firstName, lastName, fullName, gender, phoneNumber, email, pet, vehicle, vehicleNumber, GETDATE(), createdBy  FROM OPENJSON(@array)  
        WITH (  firstName VARCHAR(100),  
    lastName VARCHAR(100),  
    fullName VARCHAR(100),  
    gender VARCHAR(100),  
    phoneNumber VARCHAR(100),  
    email VARCHAR(100),  
    pet VARCHAR(100),  
    vehicle VARCHAR(100),  
    vehicleNumber VARCHAR(100),  
                createdBy int);  
  
COMMIT TRANSACTION  
END TRY  
BEGIN CATCH  
ROLLBACK TRANSACTION                            
DECLARE @ErrorMessage NVARCHAR(4000)=ERROR_MESSAGE()+' Please verify "'+ERROR_PROCEDURE()+'" stored procedure at the line number '+CONVERT(NVARCHAR(20),ERROR_LINE() )+ '.';                                      
DECLARE @ErrorSeverity INT = ERROR_SEVERITY();                                      
DECLARE @ErrorState INT=ERROR_STATE();                                      
RAISERROR (@ErrorMessage,@ErrorSeverity,@ErrorState)   
END CATCH  
END  
GO

Sample JSON for testing:

DECLARE @array VARCHAR(MAX);  
SET @array = '[{  
        "firstName": "Winston",  
        "lastName": "Lemke",  
        "fullName": "Winston Lemke",  
        "gender": "Male",  
        "phoneNumber": "466.780.4652 x268",  
        "email": "Josefa89@yahoo.com",  
        "pet": "Villanuco de Las Encartaciones",  
        "vehicle": "Mazda Escalade",  
        "vehicleNumber": "8CP7UC1N83MY25770",  
        "createdBy": 1  
    },  
 {  
        "firstName": "Finn",  
        "lastName": "Bartoletti",  
        "fullName": "Finn Bartoletti",  
        "gender": "Female",  
        "phoneNumber": "1-931-498-0214 x454",  
        "email": "Clair.Rodriguez@hotmail.com",  
        "pet": "Bouvier des Flandres",  
        "vehicle": "Tesla Ranchero",  
        "vehicleNumber": "MG1XVY29D0M798471",  
        "createdBy": 1  
    }]';  

EXEC sp_upsert_play_user_details1 @array;

In C# I parsed the data from a local file and passed the string to stored procedure:

string text = System.IO.File.ReadAllText(@"C:\Users\krish\OneDrive\Desktop\object.json");

_playdb.CommandTimeout = 3000;
_playdb.sp_upsert_play_user_details1(text);

As mentioned above this only took 51 secs to insert one mil records and probably much faster in faster server/Work machine.

DharmanBot
  • 1,066
  • 2
  • 6
  • 10
krish
  • 237
  • 4
  • 14