8

I am using System.Data.SQLite and SQLiteDataReader in my C# project. I am facing performance issues when getting the results of a query with attached databases.

Here is an example of a query to search text into two databases :

ATTACH "db2.db" as db2;

SELECT MainRecord.RecordID,
((LENGTH(MainRecord.Value) - LENGTH(REPLACE(UPPER(MainRecord.Value), UPPER("FirstValueToSearch"), ""))) / 18) AS "FirstResultNumber",
((LENGTH(DB2Record.Value) - LENGTH(REPLACE(UPPER(DB2Record.Value), UPPER("SecondValueToSearch"), ""))) / 19) AS "SecondResultNumber"
FROM main.Record MainRecord
JOIN db2.Record DB2Record ON DB2Record.RecordID BETWEEN (MainRecord.PositionMin) AND (MainRecord.PositionMax)
WHERE FirstResultNumber > 0 AND SecondResultNumber > 0;

DETACH db2;

When executing this query with SQLiteStudio or SQLiteAdmin, this works fine, I am getting the results in a few seconds (the Record table can contain hundreds of thousands of records, the query returns 36000 records).

When executing this query in my C# project, the execution takes a few seconds too, but it takes hours to run through all the results.

Here is my code :

// Attach databases

SQLiteDataReader data = null;

using (SQLiteCommand command = this.m_connection.CreateCommand())
{
    command.CommandText = "SELECT...";
    data = command.ExecuteReader();
}

if (data.HasRows)
{
    while (data.Read())
    {
        // Do nothing, just iterate all results
    }
}

data.Close();

// Detach databases

Calling the Read method of the SQLiteDataReader once can take more than 10 seconds ! I guess this is because the SQLiteDataReader is lazy loaded (and so it doesn't return the whole rowset before reading the results), am I right ?

EDIT 1 :

I don't know if this has something to do with lazy loading, like I said initially, but all I want is being able to get ALL the results as soon as the query is ended. Isn't it possible ? In my opinion, this is really strange that it takes hours to get results of a query executed in few seconds...

EDIT 2 :

I just added a COUNT(*) in my select query in order to see if I could get the total number of results at the first data.Read(), just to be sure that it was only the iteration of the results that was taking so long. And I was wrong : this new request executes in few seconds in SQLiteAdmin / SQLiteStudio, but takes hours to execute in my C# project. Any idea why the same query is so much longer to execute in my C# project?

EDIT 3 :

Thanks to EXPLAIN QUERY PLAN, I noticed that there was a slight difference in the execution plan for the same query between SQLiteAdmin / SQLiteStudio and my C# project. In the second case, it is using an AUTOMATIC PARTIAL COVERING INDEX on DB2Record instead of using the primary key index. Is there a way to ignore / disable the use of automatic partial covering indexes? I know it is used to speed up the queries, but in my case, it's rather the opposite that happens...

Thank you.

Morgan M.
  • 846
  • 2
  • 9
  • 27
  • `SQLiteDataReader` implements `IDisposable`, why don't you use `using` just like your `SQLiteCommand`? – Sébastien Sevrin May 20 '15 at 10:22
  • Well, I have a `ExecuteSingleQuery` method, returning a `SQLiteDataReader`. I just simplified my code here. – Morgan M. May 20 '15 at 10:23
  • Any reason why you're disposing the command before you're done with the reader object? And when you say "lazy loaded", where have you read this and what exactly do you mean? – Lasse V. Karlsen May 20 '15 at 11:11
  • Which SQLite provider are you using by the way, are you using the nuget package that comes with binary libraries for the SQLite access, or are you using one of the pure .NET implementations? – Lasse V. Karlsen May 20 '15 at 11:12
  • Like I said in my first comment, I have a `ExecuteSingleQuery` method returning a `SQLiteDataReader` object. That's why I dispose the command before I'm done with the reader. Is that wrong doing so ? – Morgan M. May 20 '15 at 11:15
  • About lazy loading, I saw this answer : http://stackoverflow.com/questions/962032/how-do-i-count-the-number-of-rows-returned-in-my-sqlite-reader-in-c/962053#962053 "The DataReader runs lazily, so it doesn't pick up the entirety of the rowset before beginning." I guess this is the reason why it takes so much time to run through results, no certitude. – Morgan M. May 20 '15 at 11:16
  • I am using the nuget package. – Morgan M. May 20 '15 at 11:18
  • Please post a real full reproducing code. This code in your question is obviously too simple to fail, and is broken (the using is closed before the data is used). – Simon Mourier May 23 '15 at 05:46
  • @SimonMourier : The code is not broken. The command is disposed before the reader is used, not the reader. Please tell me if this is wrong doing so. But it works. – Morgan M. May 26 '15 at 08:07
  • Generally speaking, consider using `using` instead of a manual call to `Close()`, to ensure that `Close()` will be called even if an exception occurs. – Eldritch Conundrum May 27 '15 at 11:40
  • Yes, I generally call `Close` in the `finally` clause of the `try` / `catch` surrounding my code... But I agree with you that using using is still better. – Morgan M. May 27 '15 at 13:40

3 Answers3

1

Besides finding matching records, it seems that you're also counting the number of times the strings matched. The result of this count is also used in the WHERE clause.

You want the number of matches, but the number of matches does not matter in the WHERE clause - you could try change the WHERE clause to:

WHERE MainRecord.Value LIKE '%FirstValueToSearch%' AND DB2Record.Value LIKE '%SecondValueToSearch%'

It might not result in any difference though - especially if there's no index on the Value columns - but worth a shot. Indexes on text columns require alot of space, so I wouldn't blindly recommend that.

If you haven't done so yet, place an index on the DB2's RecordID column.

You can use EXPLAIN QUERY PLAN SELECT ... to make SQLite spit out what it does to try to make your query perform, the output of that might help diagnose the problem.

C.Evenhuis
  • 25,996
  • 2
  • 58
  • 72
  • Yes, the number of matches is not used in the `WHERE` clause, but my intention was to use it in my code, once the query executed. I tried your suggestion anyway, it's faster indeed, but its not solving the problem. No, I haven't placed an index on the DB2's RecordID column, because it's the primary key, and I believe that SQLite creates automatically indexes for primary keys, right? I'm going to try the `EXPLAIN QUERY PLAN SELECT`. Thank you :) – Morgan M. May 22 '15 at 14:29
  • @MorganM. yes, primary key is an index. – C.Evenhuis May 22 '15 at 15:06
  • Thanks to `EXPLAIN QUERY PLAN`, I noticed that there was a slight difference in the execution plan for the same query between SQLiteAdmin / SQLiteStudio and my C# project. In the second case, it is using an automatic partial covering index on DB2Record instead of using the primary key index. Is there a way to ignore / disable the use of automatic partial covering indexes? I know it is used to speed up the queries, but in my case, it's rather the opposite that happens... – Morgan M. May 26 '15 at 09:34
  • @MorganM. I'm afraid I wouldn't know how to instruct SQLite to change its plans, other than to delete an index altogether. Afaik "partial covering" means it cannot get all requested data from the index, it has to find the actual record in DB2Record. – C.Evenhuis May 26 '15 at 10:06
  • According to this post http://stackoverflow.com/questions/20524686/what-is-an-automatic-covering-index "Automatic" means that SQLite creates a temporary index that is used only for this query, and deleted afterwards. So I'm afraid I can't delete it. – Morgan M. May 26 '15 at 10:11
1

Are you sure you use the same version of sqlite in System.Data.SQLite, SQLiteStudio and SQLiteAdmin ? You can have huge differences.

tafia
  • 1,512
  • 9
  • 18
  • I believe that SQLiteAdmin and SQLiteStudio use sqlite3.dll directly, when my C# project uses System.Data.SQLite (that depends on sqlite3.dll). – Morgan M. May 27 '15 at 09:56
  • I mean which version of sqlite3.dll. The query planners give very different results. I was stuck for long when doing such comparisons ... until i decided to build my own GUI with the same dll as my application. Try running a `select sqlite_version();` on the 3 of them. – tafia May 27 '15 at 10:14
  • You were right, they use a different version of sqlite3.dll. The version used in my C# project (3.8) is newer than the one used in SQLiteAdmin and SQLiteStudio (3.7). – Morgan M. May 27 '15 at 10:22
  • Can you try adding an index corresponding to the automatically generated index ? ... and/or try running `analyze` on both databases ([here](https://www.sqlite.org/lang_analyze.html) for more info) – tafia May 27 '15 at 10:25
  • I did'nt know the `analyze` command. I will surely give it a try. Thank you. – Morgan M. May 28 '15 at 07:26
  • In my experience it _can_ help, but not everytime. Anyway to revert back before the `analyze` command, you can just drop the sqlite_stat tables. – tafia May 28 '15 at 07:48
0

One more typical reason why SQL query can take different amount of time when executed with ADO.NET and from native utility (like SQLiteAdmin) are command parameters used in CommandText (it is not clear from your code whether parameters are used or not). Depending on ADO.NET provider implementation the following identical CommandText values:

SELECT * FROM sometable WHERE somefield = ?   // assume parameter is '2'

and

SELECT * FROM sometable WHERE somefield='2'

may lead to absolutely different execution plan and query performance.

Another suggestion: you may disable journal (specifying "Journal Mode=off;" in the connection string) and synchronous mode ("Synchronous=off;") as these options also may affect query performance in some cases.

Vitaliy Fedorchenko
  • 8,447
  • 3
  • 37
  • 34
  • Thank you for your answer, but I am not using parameters anymore. They were indeed slowing down the execution of the query (especially when I was trying to maximize my INSERT-per-second performance). – Morgan M. May 27 '15 at 11:00
  • (a bit offtopic) regarding insert performance: I have good experience in inserting huge amount of records with some kind of "bulk load" using sqlite3.exe utility executed from C# code ( .import command) - it allows to insert records much faster than individual INSERT commands. – Vitaliy Fedorchenko May 27 '15 at 11:20