2

I am having a performance issue with SQLite database (.db)

I am trying to update 1,00,000 records in database (.db) which taking around 50 minutes. Too much slow.

my code is like below ::

        for (int q = 0; q < list.Count; q++) 
            { 
        ArrayList castarraylist = new ArrayList(); 
        castarraylist = (ArrayList)(list[q]); 

        using (var cmd = new SQLiteCommand(con)) 

            using (var transaction = con.BeginTransaction()) 
            { 
                cmd.Transaction = transaction; 

                for (int y = 0; y < castarraylist.Count; y++) 
                { 
                        cmd.CommandText = Convert.ToString(castarraylist[y]); 
                           cmd.ExecuteNonQuery(); 
                } 
                transaction.Commit(); 
                GC.Collect(); 
            } 
        } 

Here each castarraylist contains 5000 records. which updating into database with transaction. so loop go through 20 times and complete the update all. While I manually check the time it's increasing the time at each iteration for 5000 records. like

1st 5000 records processing time > 1:11 minute

2nd 5000 records processing time > 1:25 minute

3rd  5000 records processing time > 1:32 minute 

4th 5000 records processing time > 1:40 minute 

5th 5000 records processing time > 1:47 minute 

6th 5000 records processing time > 1:52 minute 

...

... 

... 

17th 5000 records processing time > 3:32 minute 

18th 5000 records processing time > 3:44 minute

19th 5000 records processing time > 4:02 minute 

20th 5000 records processing time> 4:56 minute 

Why this happening I don't able to understand. My sourcecode written in C# and my laptop configuration is i5 2.6 GHz, 4 GB RAM, 500 GB HD.

I made connection like below ::

SQLiteConnection con = new SQLiteConnection("Data Source=" + fullPath + ";Version=3;Count Changes=off;Journal Mode=off;Pooling=true;Cache Size=10000;Page Size=4096;Synchronous=off"); 

(*fullpath - is my database path)

I am creating table like below...

sqlquery2="Select LINK_ID from RDF_LINK string createLinkToPoly = "create table temp2 AS " + sqlquery2;

This would creating a table and inserting records which are get through by sqlquery2.

Below statement extends Spatialite on SQLite

ExecuteStatement("select load_extension('spatialite.dll')", con);

My Update statement is like below ::

UPDATE temp2 SET GEOM = Transform(LineStringFromText('LINESTRING(4.38368 51.18109,4.38427 51.18165)',4326),32632)WHERE LINK_ID= 53841546

so This kind of 100000 statement building in different threads and inserting into LIST

at last executing UPDATE statements in above code (now using code of Larry suggested)

Hardik
  • 259
  • 1
  • 2
  • 17
  • 2
    What are the actual SQL commands that get executed? – CL. Jul 04 '14 at 07:01
  • @CL. actual commands are in **castarraylist** – Hardik Jul 04 '14 at 10:17
  • @Larry .. I supposed memory is the issue here so I used GC.Collect() – Hardik Jul 04 '14 at 10:18
  • I did not ask *where* they are stored but *what* they are. The increasing times show that you have some non-indexed lookup in there. Show some SQL example. – CL. Jul 04 '14 at 10:56
  • Is it better now, without GC.Collect, with the Transaction that encloses the whole processing ? – Larry Jul 04 '14 at 11:39
  • @CL. .. sorry.. mine SQLite database is extends to SPATIALITE Database and then I run UPDATE statement, they are like ... . "UPDATE temp2 SET GEOM = Transform(LineStringFromText('LINESTRING(4.38368 51.18109,4.38427 51.18165)',4326),32632)WHERE LINK_ID= 53841546" – Hardik Jul 04 '14 at 12:42
  • @CL. .. yes you were right I am not creating Index or Primary key on that table .. I am creating this temp table like ... "create table temp2 AS " + selectSQLquery2; ... where selectSQLquery2="Select LINK_ID from RDF_LINK" ... can I able to make primary key while creating table ? – Hardik Jul 04 '14 at 12:45
  • @Larry .. Yes your code is working more efficiently then before.. but it's still taking 35 minutes to update 100000 records. so I am testing the same... – Hardik Jul 04 '14 at 12:47
  • Then the issues are the queries itself. Could you please update your question to show the queries you perform ? – Larry Jul 04 '14 at 13:20
  • @Larry... I have added more details.. thanks for your kind help – Hardik Jul 04 '14 at 13:39
  • I think the WHERE statement condition is taking time. Generally speaking, a database requires an index or a primary key for fields that are used in WHERE clauses. This other [question](http://stackoverflow.com/questions/946011/sqlite-add-primary-key) will guide you better than I can dot to add a primary key. You can do it in the SQL statement that creates the table. – Larry Jul 04 '14 at 16:13

4 Answers4

3

First, you should try using prepared statement for better performance. Take a look at System.Data.SQLite documentation, so you can use SQLiteParameter and set the parameter value in the loop.

Second, ArrayList should be slower than List or Array. Maybe changing that can help.

Third, there may be some Pragma commands you can use.

Edit: I see you already turned off synchronous and journal_mode, I'm not sure there is any other pragma you should use. In some cases, locking_mode = EXCLUSIVE and temp_store = MEMORY can be helpful.

cshu
  • 5,654
  • 28
  • 44
  • @Griddor... I am not passing any parameter values. I am preapring statement directly in another thread... I tried to use `LIST` but it's not improved much performance.. yes mostly known all Pragma commands I used while making connection to database – Hardik Jul 04 '14 at 12:50
  • 1
    @Hardik Actually prepared statement is a vital part of using SQLite, only change parameter value in the loop. Executing the query directly means preparing the statement for each individual query. Besides, like Larry's answer, committing the transaction only once after all the loops should be faster than multiple commits. – cshu Jul 04 '14 at 13:13
3

Currencly, the transaction is run per query, which makes no sense.

Enclose your main loop code in the transaction, and remove this GC.Collect().

EDIT:

As I understood, you dont want the global update to be rolled back in case of an error. So I changed the code a bit.

Additionally, I am not sure that the command object can be reused by changing the CommandText and running queries again. That's why I suggest to create it every time.

using (var transaction = con.BeginTransaction()) 
{ 
    for (int q = 0; q < list.Count; q++) 
    { 
        var castarraylist = (ArrayList)(list[q]); 

        for (int y = 0; y < castarraylist.Count; y++) 
        { 
            using (var cmd = new SQLiteCommand(con)) 
            {
                cmd.Transaction = transaction; 
                cmd.CommandText = Convert.ToString(castarraylist[y]);
                try
                {
                    cmd.ExecuteNonQuery();
                }
                catch(Exception ex)
                {
                    // Log the update problem
                    Console.WriteLine("Update problem " + cmd.CommandText + " - Reason: " + ex.Message);
                }
            }
        }
    }

    transaction.Commit();
}
Larry
  • 17,605
  • 9
  • 77
  • 106
  • @Larry... I used your code and I also take all value in one list. so my code is now ... `using (var transaction = con.BeginTransaction()){try{using (var cmd = new SQLiteCommand(con)) {cmd.Transaction = transaction; for (int q = 0; q < latlongquery1output.Count; q++) {cmd.CommandText = latlongquery1output[q]; cmd.ExecuteNonQuery();}} transaction.Commit();} catch{transaction.Rollback();throw;}}` – Hardik Jul 08 '14 at 14:21
  • @Larry... currently this list contains 1 million records... and with this code is single record is fail total transaction is being failed so is there any simpler way ? – Hardik Jul 08 '14 at 14:27
  • @Hardik Oh, I see. I adapted the code accordingly so it will not rollback the whole changes if a record refuses to update for whatever reasons. I also changed the way the SQLLiteCommand object is used so it is not re-used anymore. Let me know if it is better or not. – Larry Jul 08 '14 at 16:39
  • @Larry... yes your code and creation of Index help me so much and it's improved my performance too (now the task complete in **only 5 minutes**) ... Can we able to change the code so it can not rollback the whole changes if a record refuses to update for whatever reasons ? – Hardik Jul 09 '14 at 12:47
  • Glad it helps ! :) Actually, the code in the answer is designed to NOT rollback changes to the database if a single record refuses to update. Instead, you will have to implement something in the catch section to log if an update fails (in a file or anything else), and it will continue until everything is done. Then all updates are eventually committed. Have you tried to read and insert records in a new table instead of updating an existing one ? It might be even faster. – Larry Jul 09 '14 at 15:38
  • I am learning how to implement mechanism which can continue if an any update fails... I must have to build UPDATE statement I guess but I will rethink about it.. – Hardik Jul 11 '14 at 06:46
  • I updated the code slightly to put something in the catch that shows what's wrong in the console if an update fails. Normally, it should continue until the end of the loop. I do not know if sqlite rollbacks automatically a pending transation if an error occurs :( – Larry Jul 11 '14 at 07:01
  • Yes, sqlite doing that... it's rollback transaction.. So let's suppose we are inserting 1000 records and upto 999 everything is good and last record statement is failing then it's rollback all the records update.. – Hardik Jul 11 '14 at 07:14
  • Argh. This SQLite automatic rollback thing may be asked in another question. I unfortunately do not know how to disable this. – Larry Jul 13 '14 at 16:32
2

You're probably not having a performance issue with SQLite; you're almost certainly having a performance issue with your own code:

  • Calling GC.Collect() at all is almost certainly not necessary. What you're doing here shouldn't be causing any significant memory pressure, and if it were I would strongly recommend just letting the garbage collector do it's own thing rather than force the issue. Worse, you're calling GC.Collect() on every single iteration of the loop. Don't do this!

  • Is it really necessary for each individual update to be made in its own transaction? You do realise that if your code fails and throws an exception halfway through this loop, the first half of the updates will have been committed but you won't have any way of picking up where you left off? You won't even have an easy way of knowing where you left off.

  • Is there any particular reason you're using an ArrayList, rather than a List<T>? This is causing you to need to perform a cast and call Convert.ToString in your inner loop, which shouldn't be necessary (unless you have a very, very good reason for using ArrayList).

Chris
  • 4,661
  • 1
  • 23
  • 25
2

The UPDATE statements are slow because the database has to scan all records in the table to find any matching LINK_ID values. You need an index on the LINK_ID column.

Either create it manually before doing the updates:

CREATE INDEX temp2_linkid ON temp2(LINK_ID);

Or create the index when you are creating the table (which requires that the table is created explicitly):

CREATE TABLE temp2 ( LINK_ID INTEGER PRIMARY KEY );
INSERT INTO temp2(LINK_ID) SELECT LINK_ID FROM RDF_LINK;
CL.
  • 173,858
  • 17
  • 217
  • 259
  • @CL... yes I have created INDEX and it's really very much effective. – Hardik Jul 06 '14 at 06:20
  • but I think I can't be able to create Primary Key because I am inserting more then 1 column sometime and they are not predefined. – Hardik Jul 06 '14 at 06:31
  • currently facing problem ... UPDATE statements I am creating in different Threads and where it's giving outofMemory Exception on string and Array. – Hardik Jul 06 '14 at 06:33