17

I've a mobile application. My client has a large data set ~100.000 records. It's updated frequently. When we sync we need to copy from one database to another.

I've attached the second database to the main, and run an insert into table select * from sync.table.

This is extremely slow, it takes about 10 minutes I think. I noticed that the journal file gets increased step by step.

How can I speed this up?

EDITED 1

I have indexes off, and I have journal off. Using

insert into table select * from sync.table

it still takes 10 minutes.

EDITED 2

If I run a query like

select id,invitem,invid,cost from inventory where itemtype = 1 
order by invitem limit 50 

it takes 15-20 seconds.

The table schema is:

CREATE TABLE inventory  
('id' INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
 'serverid' INTEGER NOT NULL DEFAULT 0,
 'itemtype' INTEGER NOT NULL DEFAULT 0,
 'invitem' VARCHAR,
 'instock' FLOAT  NOT NULL DEFAULT 0,
 'cost' FLOAT NOT NULL DEFAULT 0,
 'invid' VARCHAR,
 'categoryid' INTEGER  DEFAULT 0,
 'pdacategoryid' INTEGER DEFAULT 0,
 'notes' VARCHAR,
 'threshold' INTEGER  NOT NULL DEFAULT 0,
 'ordered' INTEGER  NOT NULL DEFAULT 0,
 'supplier' VARCHAR,
 'markup' FLOAT NOT NULL DEFAULT 0,
 'taxfree' INTEGER NOT NULL DEFAULT 0,
 'dirty' INTEGER NOT NULL DEFAULT 1,
 'username' VARCHAR,
 'version' INTEGER NOT NULL DEFAULT 15
)

Indexes are created like

CREATE INDEX idx_inventory_categoryid ON inventory (pdacategoryid);
CREATE INDEX idx_inventory_invitem ON inventory (invitem);
CREATE INDEX idx_inventory_itemtype ON inventory (itemtype);

I am wondering, the insert into ... select * from isn't the fastest built-in way to do massive data copy?

EDITED 3

SQLite is server-less, so please stop voting a particular answer, because that is not the answer I'm sure.

Sid M
  • 4,354
  • 4
  • 30
  • 50
Pentium10
  • 204,586
  • 122
  • 423
  • 502
  • Sounds like a continuation of your existing question?? http://stackoverflow.com/questions/2121336/insert-takes-too-long-code-optimization-needed – AdaTheDev Jan 23 '10 at 22:26
  • it is, I don't know how can I reply for my thread, I can only comment on posts, or answer it.... but I want to raise another question... – Pentium10 Jan 23 '10 at 22:53
  • What does each row look like? 100,000 records isn't a lot of the rows are small. We run a data import regularly with 300,000 records and it runs in a few seconds (which we consider slow). – Samuel Neff Jan 26 '10 at 02:39
  • see my second edit, I've added table schema. Do you run those large insert on Compact Framework, on a handheld device? – Pentium10 Jan 26 '10 at 09:49
  • 1
    When I launch the sqlite3 command line tool, create two databases with your inventory schema, insert 100,000 records with data in every field into one, and then copy from one database to another, each operation returns in less than a second. I can copy the same 100,000 records over and over and over. Each time the journal only grows to less then 10kb and the query runs in in less than a second. I'm doing this on a PC, not a device, but still sub-second to 10 minutes seems extreme. – Samuel Neff Jan 26 '10 at 15:07
  • That's PC, there is everything fine. On a mobile device with 350MHZ processor and limited memory eg: 60MB operating memory, 10MB of free space, and disk writes of 12MB taking 20 seconds, the process is awful. I event don't know how if there is a command line mode of sqlite on Compact Framework. – Pentium10 Jan 26 '10 at 16:40
  • Regarding EDITED2: You should be able to speed up the query by creating an index on (itemtype, invitem): CREATE INDEX idx_inventory_typeandinv ON inventory (itemtype, invitem); – Christian Schwarz Jan 26 '10 at 19:00

8 Answers8

9

If the target is some version of MS SQL Server, SqlBulkCopy offers an efficient insert for large data sets this is similar to the command bcp.

You can also disable/remove the non-clustered indexes before inserting, and re-create them after.

In SQLite, these are usually pretty fast:

.dump ?TABLE? ...      Dump the database in an SQL text format
.import FILE TABLE     Import data from FILE into TABLE

Also try: PRAGMA journal_mode = OFF

FYI, you should be able to run the command line utility on Windows Mobile if you include it in your package.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
jspcal
  • 50,847
  • 7
  • 72
  • 76
  • 2
    As stated in tags, it's sqlite. I am already using bulk copy of SQLite, by attaching two databases and using a INSERT INTO .. SELECT * FROM method to copy the data. My indexes, triggers and several pragmas are already turned off. – Pentium10 Jan 23 '10 at 23:24
  • 2
    for sqlite try the `.dump/.import` commands rather than select into (your tag didn't specify both dbs were sqlite) – jspcal Jan 23 '10 at 23:34
  • I'm not sure if on mobile I can run `.dump/.import` commands. It's server less. (as for not mentioning if both dbs are sqlite, I missed it. But you could deduct it knowing that you cannot ATTACH different type of db in SQL) – Pentium10 Jan 23 '10 at 23:38
  • 1
    @Pentium10: you can, linked servers with odbc – jspcal Jan 23 '10 at 23:43
  • yeah its pretty useful if you need it – jspcal Jan 24 '10 at 00:07
  • Do you have the Transaction-Commit facility activated? – Tristan Jan 24 '10 at 00:21
  • @jspcal can you guide me what cmd line utility do I have to include it? and how to use it? – Pentium10 Jan 24 '10 at 22:47
  • @Tristan Yes I have....., but tried as OFF, and still slow. probably it's slow writing to disk, the db file on mobile is 12 megabytes, that's some time to write it. – Pentium10 Jan 24 '10 at 22:49
  • @Pentium10: shell.c has the command line exe, from the tarball. compiles fine on CF. some compile tips here: http://www.sqlite.org/howtocompile.html – jspcal Jan 24 '10 at 23:33
  • `compiles fine on CF` what do you mean? I have to compile on Windows Mobile? – Pentium10 Jan 24 '10 at 23:52
  • no you just have to link against the right framework, like any other executable – jspcal Jan 25 '10 at 00:57
6

I don't think that attaching the two databases and running INSERT INTO foo (SELECT * FROM bar) is the fastest way to do this. If you are synching between a handheld device and a server (or another device) could the transport mechanism be the bottleneck? Or are the two database files already on the same filesysem? If the filesystem on the device is slower flash-memory, could this be a bottleneck?

Are you able to compile/run the raw SQLite C code on your device? (I think that the RAW sqlite3 amalgamation should compile for WinCE/Mobile) If so, and you are willing:

  • To write some C code (using the SQLite C API)
  • Increase risk of data loss by turning off disk journaling

It should be possible for to write a small stand-alone executable to copy/synchronize the 100K records between the two databases extremely quickly.

I've posted some of what I learned about optimizing SQLite inserts here: Improve INSERT-per-second performance of SQLite?


Edit: Tried this out with real code...

I don't know all the steps involved in building a Windows Mobile executable, but the SQLite3 amalgamation should compile out-of-the box using Visual Studio. Here is a sample main.c program that opens two SQLite databases (both have to have the same schema - see the #define TABLE statement) and executes a SELECT statement and then binds the resulting rows to an INSERT statement:

/*************************************************************
** The author disclaims copyright to this source code.  In place of
** a legal notice, here is a blessing:
**
**    May you do good and not evil.
**    May you find forgiveness for yourself and forgive others.
**    May you share freely, never taking more than you give.
**************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include "sqlite3.h"

#define SOURCEDB "C:\\source.sqlite"
#define DESTDB "c:\\dest.sqlite"

#define TABLE "CREATE TABLE IF NOT EXISTS TTC (id INTEGER PRIMARY KEY, Route_ID TEXT, Branch_Code TEXT, Version INTEGER, Stop INTEGER, Vehicle_Index INTEGER, Day Integer, Time TEXT)"
#define BUFFER_SIZE 256

int main(int argc, char **argv) {

    sqlite3 * sourceDB;
    sqlite3 * destDB;

    sqlite3_stmt * insertStmt;
    sqlite3_stmt * selectStmt;

    char * insertTail = 0;
    char * selectTail = 0;

    int n = 0;
    int result = 0;
    char * sErrMsg = 0;
    clock_t cStartClock;

    char sInsertSQL [BUFFER_SIZE] = "\0";
    char sSelectSQL [BUFFER_SIZE] = "\0";

    /* Open the Source and Destination databases */
    sqlite3_open(SOURCEDB, &sourceDB);
    sqlite3_open(DESTDB, &destDB);

    /* Risky - but improves performance */
    sqlite3_exec(destDB, "PRAGMA synchronous = OFF", NULL, NULL, &sErrMsg);
    sqlite3_exec(destDB, "PRAGMA journal_mode = MEMORY", NULL, NULL, &sErrMsg);

    cStartClock = clock(); /* Keep track of how long this took*/

    /* Prepared statements are much faster */
    /* Compile the Insert statement */
    sprintf(sInsertSQL, "INSERT INTO TTC VALUES (NULL, @RT, @BR, @VR, @ST, @VI, @DT, @TM)");
    sqlite3_prepare_v2(destDB, sInsertSQL, BUFFER_SIZE, &insertStmt, &insertTail);

    /* Compile the Select statement */
    sprintf(sSelectSQL, "SELECT * FROM TTC LIMIT 100000");
    sqlite3_prepare_v2(sourceDB, sSelectSQL, BUFFER_SIZE, &selectStmt, &selectTail);

    /* Transaction on the destination database */
    sqlite3_exec(destDB, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);

    /* Execute the Select Statement.  Step through the returned rows and bind
    each value to the prepared insert statement.  Obviously this is much simpler
    if the columns in the select statement are in the same order as the columns
    in the insert statement */
    result = sqlite3_step(selectStmt);
    while (result == SQLITE_ROW)
    {

        sqlite3_bind_text(insertStmt, 1, sqlite3_column_text(selectStmt, 1), -1, SQLITE_TRANSIENT); /* Get Route */
        sqlite3_bind_text(insertStmt, 2, sqlite3_column_text(selectStmt, 2), -1, SQLITE_TRANSIENT); /* Get Branch */
        sqlite3_bind_text(insertStmt, 3, sqlite3_column_text(selectStmt, 3), -1, SQLITE_TRANSIENT); /* Get Version */
        sqlite3_bind_text(insertStmt, 4, sqlite3_column_text(selectStmt, 4), -1, SQLITE_TRANSIENT); /* Get Stop Number */
        sqlite3_bind_text(insertStmt, 5, sqlite3_column_text(selectStmt, 5), -1, SQLITE_TRANSIENT); /* Get Vehicle */
        sqlite3_bind_text(insertStmt, 6, sqlite3_column_text(selectStmt, 6), -1, SQLITE_TRANSIENT); /* Get Date */
        sqlite3_bind_text(insertStmt, 7, sqlite3_column_text(selectStmt, 7), -1, SQLITE_TRANSIENT); /* Get Time */

        sqlite3_step(insertStmt);       /* Execute the SQL Insert Statement (Destination Database)*/
        sqlite3_clear_bindings(insertStmt); /* Clear bindings */
        sqlite3_reset(insertStmt);      /* Reset VDBE */

        n++;

        /* Fetch next from from source database */
        result = sqlite3_step(selectStmt);

    }

    sqlite3_exec(destDB, "END TRANSACTION", NULL, NULL, &sErrMsg);

    printf("Transfered %d records in %4.2f seconds\n", n, (clock() - cStartClock) / (double)CLOCKS_PER_SEC);

    sqlite3_finalize(selectStmt);
    sqlite3_finalize(insertStmt);

    /* Close both databases */
    sqlite3_close(destDB);
    sqlite3_close(sourceDB);

    return 0;
}

On my Windows desktop machine this code copies 100k records from source.sqlite to dest.sqlite in 1.20 seconds. I don't know exactly what kind of performance you'll see on a mobile device with flash memory (but I am curious).

Community
  • 1
  • 1
Mike Willekes
  • 5,960
  • 10
  • 33
  • 33
  • The two databases are already on the same filesystem. The filesystem is a slower flash-memory and can be a bottleneck. I don't know how to compile/run the raw SQLite C code on my device. If you point me in the right direction I will give a try. – Pentium10 Feb 05 '10 at 20:06
  • If you can use Visual Studio to build a Windows Mobile C++ Executable this should be simple. The SQLite3 Amalgamation contains all of the SQLite code in one `.c` file and one `.h` file. – Mike Willekes Feb 05 '10 at 21:51
4

I'm mobile right now so I can't post a very detailed answer, but this might be worth reading:

http://sqlite.org/cvstrac/wiki?p=SpeedComparison

As you can see SQLite 3 performs INSERTs way faster when using indexes and / or transactions. Also, INSERTs FROM SELECTs doesn't seem to be the strong of SQLite.

Brian
  • 25,523
  • 18
  • 82
  • 173
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • So you say doing a select, then running through that to create prepared insert statements and executing insert statement summed all this is faster than INSERTs FROM SELECTs ? – Pentium10 Feb 12 '10 at 11:40
1

INSERT INTO SELECT * from attached databases is the fastest available option in SQLite. A few things to look out into.

  1. Transactions. Make sure the entire thing is inside a transaction. This is really critical. If it's only one SQL statement then it's not important, but you said the journal increases "step by step" which indicates it's more than one statement.

  2. Triggers. Do you have triggers running? Those obviously could affect performance.

  3. Constraints. Do you have unnecessary constraints? You can't disable them or remove/re-add them, so if they're necessary there isn't much you can do about them, but it's something to consider.

You already mentioned turning off indexes.

Samuel Neff
  • 73,278
  • 17
  • 138
  • 182
  • also mentioned I have journal mode off, that means transactions are off, it takes to much time to write the transaction file to disk, it doesn't helped...turned off. I do not have constraints neither triggers for insert. – Pentium10 Jan 26 '10 at 09:53
1

Do all 100 000 records change very often? Or is it a subset that changes?

If so, you should consider adding an updated_since_last_sync column which gets flagged when an update is made, so during the next sync you only copy the records that have actually changed. Once the records are copied over, you set the flag column back to zero.

Tom van Enckevort
  • 4,170
  • 1
  • 25
  • 40
0

Send only deltas. I.e. Send only diffs. I.e. Send only what's changed.

Viktor Klang
  • 26,479
  • 7
  • 51
  • 68
  • each row will have at least 1 modified column, I don't know which one. It would be a pain to detect that, and run an update on those rows. Think of my question like a refresh operation. – Pentium10 Jan 23 '10 at 23:25
0

What about storing the sync.table database table within a separate file? That way you just need to make a copy of that file in order to sync. I bet that's way faster than syncing by SQL.

  • I don't understand your idea. There are two files already separated maindb.sqlite and sync.sqlite. In SQLite there are no separate files for each table. A file is a database. – Pentium10 Jan 26 '10 at 19:28
  • I suppose there are a bunch of tables in maindb.sqlite and you just want to sync one of those tables in one direction (= backup?). If that's the case, you could introduce a new database file (maindb-syncable.sqlite for instance) that only contains the table you need to synchronize. Then just create a copy of that file whenever you need to sync. – Christian Schwarz Jan 26 '10 at 19:41
  • This is a full sync process. There is a main central database and the product has Desktop version too. There is a sync process when the data from the central database is synced to mobile device. The problem appeared after a client come on board with a large inventory. So not only the inventory table is large, but sooner or later the invoices, expenses, parts tables can be large ones too. I want to insert or replace **all records** from sync(new result from server) --> to main(mobile local). The sync.sqlite db is downloaded by the sync process. – Pentium10 Jan 26 '10 at 22:37
0

If you haven't already you need to wrap it in a transaction. Makes a significant speed difference.

Jay
  • 13,803
  • 4
  • 42
  • 69