6

I have multiple database files which exist in multiple locations with exactly similar structure. I understand the attach function can be used to connect multiple files to one database connection, however, this treats them as seperate databases. I want to do something like:

SELECT uid, name FROM ALL_DATABASES.Users;

Also,

SELECT uid, name FROM DB1.Users UNION SELECT uid, name FROM DB2.Users ;

is NOT a valid answer because I have an arbitrary number of database files that I need to merge. Lastly, the database files, must stay seperate. anyone know how to accomplish this?

EDIT: an answer gave me the idea: would it be possible to create a view which is a combination of all the different tables? Is it possible to query for all database files and which databases they 'mount' and then use that inside the view query to create the 'master table'?

chacham15
  • 13,719
  • 26
  • 104
  • 207
  • Why are you not willing to combine the table objects (not the DB file) logically in the client in a query? If you are willing to specify the databases files to be merged (the number of which this week might be 5 but next week 7 or 3) why can't you do the same only in a query? – Tim Feb 06 '11 at 13:01
  • The reason for this is that the other files are on remote servers which may or may not be up. I already have an abstraction that allows them to appear to be local files. Furthermore, they contains different, but not required data to the local database. Essentially, im pooling all the databases to form a distributed database (although Im hesitant about using that term since the databases are used in a completely different way than the name suggests, and I also dont want a distributed database solution since since the files are already local and it just complicates things). – chacham15 Feb 06 '11 at 14:07

2 Answers2

7

Because SQLite imposes a limit on the number of databases that can be attached at one time, there is no way to do what you want in a single query.

If the number can be guaranteed to be within SQLite's limit (which violates the definition of "arbitrary"), there's nothing that prevents you from generating a query with the right set of UNIONs at the time you need to execute it.

To support truly arbitrary numbers of tables, your only real option is to create a table in an unrelated database and repeatedly INSERT rows from each candidate:

ATTACH DATABASE '/path/to/candidate/database' AS candidate;
INSERT INTO some_table (uid, name) SELECT uid, name FROM candidate.User;
DETACH DATABASE candidate;
Blrfl
  • 6,817
  • 1
  • 25
  • 25
  • 1
    barring the limit on attached tables (what i meant by arbitrary is a number varying with time between 1 and the max num of tables attached) is there a way to make the edit possible? – chacham15 Feb 06 '11 at 14:17
  • Sure. Attach all of the databases and generate a multi-`UNION` query. There's no law that says queries have to be hard-coded into your program ahead of time. If you were going to do this with a view, you'd have to generate the `CREATE VIEW` the same way. – Blrfl Feb 06 '11 at 15:02
  • You're saying that the code would need to query the number of attached dbfiles and then adjust its queries accordingly. That can be very annoying for a user program since the UNION cant always be done in an obvious way. For example, since the two databases are seperate, primary keys increment seperately meaning that there can be two different records with the same primary key. Therefore, I was thinking that all of that complexity can be hidden behind the view and then that leaves the code with much simpler queries since it needs not worry about the multi-file abstraction. – chacham15 Feb 08 '11 at 10:56
  • If you have overlap in the primary keys, you still have to come up with a new way to uniquely identify each row in the combined version. That's a different problem. At any rate, there's no way to ask SQLite to automagically combine table `x` from every attached database. At some point you have to know what databases you've attached under what names and generate the query to combine them. Not even Oracle does that. – Blrfl Feb 08 '11 at 14:21
0

Some cleverness in the schema would take care of this.

You will generally have 2 types of tables: reference tables, and dynamic tables. Reference tables have the same content across all databases, for example country codes, department codes, etc.

Dynamic data is data that will be unique to each DB, for example time series, sales statistics,etc.

The reference data should be maintained in a master DB, and replicated to the dynamic databases after changes.

The dynamic tables should all have a column for DB_ID, which would be part of a compound primary key, for example your time series might use db_id,measurement_id,time_stamp. You could also use a hash on DB_ID to generate primary keys, use same pk generator for all tables in DB. When merging these from different DBS , the data will be unique.

So you will have 3 types of databases:

  • Reference master -> replicated to all others

  • individual dynamic -> replicated to full dynamic

  • full dynamic -> replicated from reference master and all individual dynamic.

Then, it is up to you how you will do this replication, pseudo-realtime or brute force, truncate and rebuild the full dynamic every day or as needed.