2

I'm currently inserting 1 million records into two tables at the same time in 90 to 100 seconds using a stored procedure. This in my scenario is not acceptable. I want to find a way to drop the time to less than 10 seconds.

I have tried inserting the records one after the order which was very slow - it took about an hour. Then I tried using table valued parameters to insert all the records once. This got the time down to 90 - 100 seconds.

This is the c# calling code:

public Task<int> CreateGiftVoucher(IEnumerable<Gift> vouchersList)
{

    GiftStreamingSqlRecord record = new GiftStreamingSqlRecord(vouchersList);

    foreach (var t in vouchersList)
    {
        Console.WriteLine($"<<<<<gfts>>> {t}");
    }

    try
    {
        var connection = Connection;

        if (connection.State == ConnectionState.Closed) connection.Open();

        string storedProcedure = "dbo.usp_CreateGiftVoucher";

        var command = new SqlCommand(storedProcedure, connection as SqlConnection);
        command.CommandType = CommandType.StoredProcedure;

        var param = new SqlParameter();
        param.ParameterName = "@tblGift";
        param.TypeName = "dbo.GiftVoucherType";   
        param.SqlDbType = SqlDbType.Structured;             
        param.Value = record;

        command.Parameters.Add(param);
        command.CommandTimeout = 60;
        return command.ExecuteNonQueryAsync();                 

    }
    catch (System.Exception)
    {
        throw;
    }
    finally
    {
        Connection.Close();
    }
}

This is the GiftStreamingRecord class

public GiftStreamingSqlRecord(IEnumerable<Gift> gifts) =>  this._gifts = gifts;

public IEnumerator<SqlDataRecord> GetEnumerator()
{
    SqlMetaData[] columnStructure = new SqlMetaData[11];
    columnStructure[0] = new SqlMetaData("VoucherId",SqlDbType.BigInt, 
                useServerDefault: false,  
                isUniqueKey: true, 
                columnSortOrder:SortOrder.Ascending, sortOrdinal: 0);
    columnStructure[1] = new SqlMetaData("Code", SqlDbType.NVarChar, maxLength: 100);
    columnStructure[2] = new SqlMetaData("VoucherType", SqlDbType.NVarChar, maxLength: 50);
    columnStructure[3] = new SqlMetaData("CreationDate", SqlDbType.DateTime);
    columnStructure[4] = new SqlMetaData("ExpiryDate", SqlDbType.DateTime);
    columnStructure[5] = new SqlMetaData("VoucherStatus", SqlDbType.NVarChar, maxLength: 10);
    columnStructure[6] = new SqlMetaData("MerchantId", SqlDbType.NVarChar, maxLength: 100);
    columnStructure[7] = new SqlMetaData("Metadata", SqlDbType.NVarChar, maxLength: 100);
    columnStructure[8] = new SqlMetaData("Description", SqlDbType.NVarChar, maxLength: 100);
    columnStructure[9] = new SqlMetaData("GiftAmount", SqlDbType.BigInt);
    columnStructure[10] = new SqlMetaData("GiftBalance", SqlDbType.BigInt);

    var columnId = 1L;

    foreach (var gift in _gifts)
    {
        var record =  new SqlDataRecord(columnStructure);
        record.SetInt64(0, columnId++);
        record.SetString(1, gift.Code);
        record.SetString(2, gift.VoucherType);
        record.SetDateTime(3, gift.CreationDate);
        record.SetDateTime(4, gift.ExpiryDate);
        record.SetString(5, gift.VoucherStatus);
        record.SetString(6, gift.MerchantId);
        record.SetString(7, gift.Metadata);
        record.SetString(8, gift.Description);
        record.SetInt64(9, gift.GiftAmount);
        record.SetInt64(10, gift.GiftBalance);
        yield return record;
    }
}

This is the stored procedure and the tvp:

CREATE TYPE [dbo].GiftVoucherType AS TABLE (
[VoucherId] [bigint] PRIMARY KEY,
[Code] [nvarchar](100) NOT NULL,
[VoucherType] [nvarchar](50) NOT NULL,
[CreationDate] [datetime] NOT NULL,
[ExpiryDate] [datetime] NOT NULL,
[VoucherStatus] [nvarchar](10) NOT NULL,
[MerchantId] [nvarchar](100) NOT NULL,
[Metadata] [nvarchar](100) NULL,
[Description] [nvarchar](100) NULL,
[GiftAmount] [bigint] NOT NULL,
[GiftBalance] [bigint] NOT NULL
)

GO

CREATE PROCEDURE [dbo].[usp_CreateGiftVoucher]

@tblGift [dbo].GiftVoucherType READONLY
AS

    DECLARE @idmap TABLE (TempId BIGINT NOT NULL PRIMARY KEY, 
                            VId BIGINT UNIQUE NOT NULL)

BEGIN TRY
    BEGIN TRANSACTION CreateGiftVoucher

        MERGE Voucher V 
        USING (SELECT [VoucherId], [Code], [VoucherType], [MerchantId], [ExpiryDate],
            [Metadata], [Description] FROM @tblGift) TB ON 1 = 0
        WHEN NOT MATCHED BY TARGET THEN
        INSERT ([Code], [VoucherType], [MerchantId], [ExpiryDate], [Metadata], [Description])
        VALUES(TB.Code, TB.VoucherType, TB.MerchantId, TB.ExpiryDate, TB.Metadata, TB.[Description])
        OUTPUT TB.VoucherId, inserted.VoucherId INTO @idmap(TempId, VId);

        -- Insert rows into table 'GiftVoucher'
        INSERT GiftVoucher
        (
        GiftAmount, GiftBalance, VoucherId
        )
        SELECT TB.GiftAmount, TB.GiftBalance, i.VId
        FROM @tblGift TB
        JOIN @idmap i ON i.TempId = TB.VoucherId

    COMMIT TRANSACTION CreateGiftVoucher
END TRY
BEGIN CATCH
    ROLLBACK
END CATCH
GO

All of these get me to insert 1 million in 90 to 100 seconds. I want to do it in less than 10 seconds.

Aleks Andreev
  • 7,016
  • 8
  • 29
  • 37
  • Buy faster Storage Drive for server? – SᴇM Feb 01 '19 at 10:22
  • Just saying "I want this" doesn´t make it happen. There are plenty of things that may or may not work, reaching from massive refactoring up to bying new hardware. – MakePeaceGreatAgain Feb 01 '19 at 10:22
  • 6
    If you want lightning-fast inserts, [BULK INSERT](https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017) and [bcp](https://learn.microsoft.com/en-us/sql/relational-databases/import-export/import-and-export-bulk-data-by-using-the-bcp-utility-sql-server?view=sql-server-2017) are two ways of doing it. – mroach Feb 01 '19 at 10:23
  • 2
    [Sql insert speed up](https://stackoverflow.com/questions/7090243/sql-insert-speed-up) – SᴇM Feb 01 '19 at 10:29
  • 3
    Can someone help this person? Or give him something helpful? Not just mocking him and make him look bad. Too bad that StackOverflow became a place like this! – Calin Vlasin Feb 01 '19 at 10:31
  • Upload to actual stored procedure execution plan [here](https://www.brentozar.com/pastetheplan/) and add the link to your question. – Dan Guzman Feb 01 '19 at 12:33

1 Answers1

6

The fastest way to insert large amounts of rows is to use bulk insertion (SqlBulkCopy or other APIs). I can see that you are using MERGE. That cannot be used with bulk copy so this design would force table valued parameters like you are using them right now. TVPs are a bit slower in terms of more CPU usage. You can also try bulk inserting to temp tables and then using MERGE. It is my understanding that a TVP physically is a temp table anyway. There is no true streaming going on. All the data that you stream into it in your C# code is simply inserted by the server into an automatically managed table.

The TVP streaming (SqlMetaData) you did is correct. It is the fastest way to transmit TVP data in my experience.

You will need to parallelize. Empirically, it is hard to exceed 100k rows per second under optimal conditions for fairly simple rows. At that point the CPU is saturated on one core. You can insert in parallel on multiple cores under certain conditions which are documented. There are requirements for the index structure. You also might encounter locking issues. A sure way to solve those is to insert into independent tables or partitions. But of course this will force you to change the other queries that are run against these tables.

If you must perform complex logic when inserting you could still insert into fresh tables and then perform the logic when querying. This is more work and error-prone but it might allow you to meet your latency requirement.

I hope these ideas help you get on the right path. Feel free to comment.

usr
  • 168,620
  • 35
  • 240
  • 369
  • 1
    I've seen TVPs stream data to achieve performance similar to SqlBulkCopy. The performance issue is likely do the sub-par performance of the 'MERGE`. – Dan Guzman Feb 01 '19 at 12:32
  • 1
    @DanGuzman MERGE does not have any intrinsic performance disadvantage (despite its bad reputation). Performance is determined by the concrete query and execution plan. Here, he'll effectively get a normal insert plan that is fed by a left join of the source and target. This is essentially the same plan shape that he'd get with INSERT WHERE NOT EXISTS. – usr Feb 01 '19 at 13:39
  • Not saying `MERGE` in general is a problem, just that query and index tuning may be needed improve performance to an acceptable level in this case. – Dan Guzman Feb 01 '19 at 13:59
  • @DanGuzman I agree. Especially locking might well kill his parallel inserts. – usr Feb 01 '19 at 15:01