0

First let me begin by saying that I'm somewhat new to sql (but have been doing python for a long while). I've been having trouble finding a good solution to my problem on the net.

The problem:

I have an undefined number (although probably less than 100) of identically structured sqlite databases that I need to query and merge results from. The databases themselves are not particularly huge.

I've been looking at the ATTACH command and following things in this tutorial: http://souptonuts.sourceforge.net/readme_sqlite_tutorial.html

import sqlite3 as lite
Database = "library.db"
con = lite.connect(Database)
con.row_factory = lite.Row
cur = con.cursor()
cur.execute("ATTACH DATABASE 'library.db' As 'db1'")
cur.execute("ATTACH DATABASE 'library2.db' As 'db2'")

cur.execute("""
            SELECT 'db1',* FROM db1.table
            UNION 
            SELECT 'db2',* FROM db2.table
            """)

but it seems like there should be a better way than to explicitly spell out each database in the execute command. Also, it looks like there's a limit to the number of databases that I can attach to? https://sqlite.org/limits.html

I've also looked at something like merging them together into a large database: How can I merge many SQLite databases? but it seems inefficient merge the databases together each time that a query needs to be made or one of the many individual databases gets changed.

Before I continue down some path, I was wondering if there were any better options for this type of thing that I'm not aware of?


Other potentially useful information about the project:

  • There are two main tables.
  • The tables from db to db can have duplicates.
  • I need to be able to grab unique values in columns for the "merged" databases.
  • Knowing which database what data came from is not essential.
  • The individual databases are updated frequently.
  • Queries to the "merged" database are made frequently.
Community
  • 1
  • 1
Radical Edward
  • 5,234
  • 5
  • 21
  • 33
  • Seems like you have lots of data in these databases and you keep them sparse because you want to avoid multiple processes fighting to get write access to the file. Maybe it's time to move to a fully featured DBMS like PostgreSQL and have one database which can handle multiple writing clients? – ElmoVanKielmo May 06 '14 at 20:07
  • The only reason for the sqlite is because that's the default format that the db's are in. They can be changed to Postgre without much trouble but all databases have to remain as individual databases (which are maintained by individuals in different locations). I don't have any control over that so unfortunately all I can really do is access them (copy, query, etc.) – Radical Edward May 06 '14 at 21:02

1 Answers1

0

You could avoid spelling out all databases in every query by using views:

CREATE VIEW MyTable_all AS
SELECT 'db1', db1.* FROM db1.MyTable
UNION ALL
SELECT 'db2', db2.* FROM db1.MyTable
...

However, if there are too many databases, you cannot use ATTACH. In that case, you have to merge all the databases together.

If doing this every time for all databases is too slow, you can synchronize a single database at a time by keeping the source for each record:

DELETE FROM MyTable WHERE SourceDB = 1;
INSERT INTO MyTable SELECT 1, * FROM db1.MyTable;
CL.
  • 173,858
  • 17
  • 217
  • 259