2

Situation

I'm using multiple storage databases as attachments to one central "manager" DB.

  • The storage tables share one pseudo-AUTOINCREMENT index across all storage databases.
  • I need to iterate over the shared index frequently.
  • The final number and names of storage tables are not known on storage DB creation.
  • On some signal, a then-given range of entries will be deleted.
  • It is vital that no insertion fails and no entry gets deleted before its signal.
  • Energy outage is possible, data loss in this case is hardly, if ever, tolerable. Any solutions that may cause this (in-memory databases etc) are not viable.
  • Database access is currently controlled using strands. This takes care of sequential access.
  • Due to the high frequency of INSERT transactions, I must trigger WAL checkpoints manually. I've seen journals of up to 2GB in size otherwise.

Current solution

I'm inserting datasets using parameter binding to a precreated statement.

INSERT INTO datatable VALUES (:idx, ...);

Doing that, I remember the start and end index. Next, I bind it to an insert statement into the registry table:

INSERT INTO regtable VALUES (:idx, datatable);

My query determines the datasets to return like this:

SELECT MIN(rowid), MAX(rowid), tablename
FROM (SELECT rowid,tablename FROM entryreg LIMIT 30000)
GROUP BY tablename;

After that, I query

SELECT * FROM datatable WHERE rowid >= :minid AND rowid <= :maxid;

where I use predefined statements for each datatable and bind both variables to the first query's results.

This is too slow. As soon as I create the registry table, my insertions slow down so much I can't meet benchmark speed.

Possible Solutions

There are several other ways I can imagine it can be done:

  1. Create a view of all indices as a UNION or OUTER JOIN of all table indices. This can't be done persistently on attached databases.

  2. Create triggers for INSERT/REMOVE on table creation that fill a registry table. This can't be done persistently on attached databases.

  3. Create a trigger for CREATE TABLE on database creation that will create the triggers described above. Requires user functions.

Questions

Now, before I go and add user functions (something I've never done before), I'd like some advice if this has any chances of solving my performance issues.

  • Assuming I create the databases using a separate connection before attaching them. Can I create views and/or triggers on the database (as main schema) that will work later when I connect to the database via ATTACH?
  • From what it looks like, a trigger AFTER INSERT will fire after every single line of insert. If it inserts stuff into another table, does that mean I'm increasing my number of transactions from 2 to 1+N? Or is there a mechanism that speeds up triggered interaction? The first case would slow down things horribly.
  • Is there any chance that a FULL OUTER JOIN (I know that I need to create it from other JOIN commands) is faster than filling a registry with insertion transactions every time? We're talking roughly ten transactions per second with an average of 1000 elements (insert) vs. one query of 30000 every two seconds (query).
Community
  • 1
  • 1
starturtle
  • 699
  • 8
  • 25

1 Answers1

0

Open the sqlite3 databases in multi-threading mode, handle the insert/update/query/delete functions by separate threads. I prefer to transfer query result to a stl container for processing.

seccpur
  • 4,996
  • 2
  • 13
  • 21
  • Thanks for the hint. It might be possible to separate insert from query/delete (separating those two is not thread safe in my case according to [multithread doc](https://www.sqlite.org/threadsafe.html)). Template containers had to be removed for compile-time performance (improvement factor 10-15). But you're planting an idea there... I'll test tomorrow when I get my hands back on the code. – starturtle Sep 18 '16 at 07:42
  • After testing I've come to the conclusion that there is no really useful multi-connection setup for my use case. To prevent simultaneous write access to the same DB, I must make `DELETE`, `INSERT` and `wal_checkpoint` routines mutually exclusive. Also, I must prevent `DELETE` and my main `SELECT` scenario from mixing execution time, otherwise I risk data loss. That's effectively close to the single-strand solution I was using beforehand. Speeding up all routines (especially `SELECT`) for themselves seems to be the way to go. Thanks anyway for the hint! – starturtle Sep 20 '16 at 14:49