12

I am working on a desktop application which uses SQLite to bulk insert tens of thousands of rows into a SQLite database. I would like help optimizing the bulk insert performance. It currently takes up to 50 seconds to insert 60 megs worth of data into the database.

  • what connection string paramaters could I use to improve performance? Should I change the buffer size? Is this possible via a connection string parameter? Are there any other connection string parameters to improve performance? My current connection string is:

    Data Source=Batch.db;Version=3;Pooling=True;Max Pool Size=10;Synchronous=off;FailIfMissing=True;Journal Mode=Off;

  • I am using Dapper ORM. (built by the guys at StackOverflow) Is there a faster way to bulk insert into Sqlite, in .net?

  • System.Data.Sqlite is being used to insert into SQLite. What about getting a special compiled version of sqlite which improves performance? Is one version of SQLite better than another? Currently using System.Data.SQLite from http://sqlite.phxsoftware.com

  • Currently, I am wrapping inserts inside a transaction to make them faster (this made a good improvement).

  • I am inserting into one table at a time into 17 tables. Could I parallelize this on different threads and make this faster?

Current Performance. Is this typical? Can I do better?

  • 55,000 rows into table with 19 columns: 2.25 sec to insert (24k inserts/sec)
  • 10,000 rows into table with 63 columns: 2.74 sec to insert (3.7k/sec)

I like SQLite, but I would love to make it a bit faster. Currently saving my objects to an XML file using XML serialization is faster than saving to a SQLite database, so my boss is asking: why switch to SQLite? Or should I be using MongoDB, or some other object database?

BrokeMyLegBiking
  • 5,898
  • 14
  • 51
  • 66
  • possible duplicate of [How do I improve the performance of SQLite?](http://stackoverflow.com/questions/1711631/how-do-i-improve-the-performance-of-sqlite) ... Although the code written in that question is C, it's what I used as a guide when writing stuff in C# with System.Data.SQLite. The big thing is the getting the `PRAGMA` right and using a transaction. How this maps to the ORM though, I don't know. – Chris J Jan 25 '12 at 16:54
  • 1
    55,000 rows in 1.73 seconds... is difficult to be better ;) – Felice Pollano Jan 25 '12 at 17:02
  • are you using the batch interface cnn.Execute("insert...", new[] {val1, val2, val3}), it reuses commands so a tad faster (maybe 5-10%) – Sam Saffron Jan 26 '12 at 03:18
  • 1
    @ChrisJ ... this is asking for a dapper specific spin on things, I think it is not a dupe – Sam Saffron Jan 26 '12 at 03:19

2 Answers2

17

So I finally found a trick to high performance bulk inserts in SQLite using .NET with System.Data.SQLite.

  • This trick improved insert performance by a factor of 4.1!
  • My total save time went from 27 seconds to 6.6 seconds. wow!

This article explains the fastest way to do bulk inserts into SQLite (archive.org link).

  • The key is reusing the same parameter objects
  • but for each record to insert, assigning a different value.

The time that .NET takes constructing all those DbParameter objects really adds up. For example with 100k rows and 30 columns = 3 million parameter objects which must be created. Instead, creating and reusing just 30 parameter objects is much faster.

Update New performance:

  • 55,000 rows (19 columns) in .53 seconds = 100k inserts/second
internal const string PeakResultsInsert = @"INSERT INTO PeakResult 
           VALUES(@Id,@PeakID,@QuanPeakID,@ISTDRetentionTimeDiff)";
                    
var command = cnn.CreateCommand();
command.CommandText = BatchConstants.PeakResultsInsert;
        
string[] parameterNames = new[]
{
    "@Id",
    "@PeakID",
    "@QuanPeakID",
    "@ISTDRetentionTimeDiff"
};
        
DbParameter[] parameters = parameterNames.Select(pn =>
{
    DbParameter parameter = command.CreateParameter();
    parameter.ParameterName = pn;
    command.Parameters.Add(parameter);
    return parameter;
}).ToArray();
        
foreach (var peakResult in peakResults)
{
    parameters[0].Value = peakResult.Id;
    parameters[1].Value = peakResult.PeakID;
    parameters[2].Value = peakResult.QuanPeakID;
    parameters[3].Value = peakResult.ISTDRetentionTimeDiff;

    command.ExecuteNonQuery();
}

It ends up that I could not use Dapper for inserting into my large tables. (For my small tables, I still use Dapper).

Note, some other things that I found:

  • I tried using multiple threads to insert data into the same database, this did not make any improvement. (didn't make a difference)

  • Upgraded from System.Data.Sqlite 1.0.69 to 1.0.79. (didn't make a difference in performance that I could see)

  • I am not assigning a Type to the DbParameter, it doesn't seem to make a performance difference either way.

  • For reads, I could not improve on Dapper's performance.

surfmuggle
  • 5,527
  • 7
  • 48
  • 77
BrokeMyLegBiking
  • 5,898
  • 14
  • 51
  • 66
  • @samsaffron. Thanks for the tip, I'll try out DynamicParameters. But it seems like Dapper should do this automatically. That when you pass in an list of objects to insert into the DB that it should reuse the same parameter objects per row. Does it create a new array of paramater objects per row? Is this why Dapper is slow for inserts? – BrokeMyLegBiking Feb 03 '12 at 03:58
  • rereading your question, dapper should work fine here with exactly the same perf ... if you pass in an IEnumerable to insert is will generate the exact same code as above – Sam Saffron Feb 03 '12 at 04:00
  • @samsaffron my performance tests clearly show that bulk inserts with the code above is 4 times faster than using an IEnumerable into a dapper insert. (unfortunately). My dapper Code: (am i missing anything?) cnn.Execute(BatchConstants.PeakResultsInsert, peakResults, trans); – BrokeMyLegBiking Feb 03 '12 at 17:23
  • can you link to a gist with your test harness? – Sam Saffron Feb 03 '12 at 21:13
  • I uploaded my ConsoleApp which I've been using. https://docs.google.com/open?id=0B4S2QMZFfsHaYjg0YjRjZmUtNGNkMy00NzA1LTgxNjAtNzJmN2VlOTM1YzAx It is a lightweight test harness which includes two methods: InsertPeakResultsDapper() and InsertPeakResultsNativeAdo() – BrokeMyLegBiking Feb 04 '12 at 11:54
  • Could you explain what part of this code keeps it from running one SQL statement per record inserted? This doesn't look like a "BULK" insert to me. – marknuzz Nov 01 '13 at 18:06
  • The article link is dead but +1 for explaining it, the answer is great even without the link. Food for thought, this could apply to other ORMs too. – CAD bloke Jan 08 '16 at 11:35
0

Currently, I am wrapping inserts inside a transaction to make them faster (this made a good improvement).

The biggest gain I've seen in bulk insert speed was to break inserts into smaller chunks. How small of a chunk varies per platform/schema/etc, I'm sure. I believe during my tests it was near 1000 or so.

Tom Kerr
  • 10,444
  • 2
  • 30
  • 46