-1

I have many different .sqlite database files (around 1000, even more), all with the same schema and I need to merge all of them into two sqlite files, as explained below.

The file names are "con_cookies_{domain.TLD}.sqlite" and "sin_cookies_{domain.TLD}.sqlite" so I would like to have "con_cookies.sqlite" and "sin_cookies.sqlite".

Which would be the best way to do it? I don't so much knowledge about sqlite so a bash script would be appreciated.

EDIT:

I share a screenshot with the DB structure. The primary key is not important if the order of the data is preserved. On the other hand, each file would have around 40-50 rows.

DB structure from source files Some SELECT {...} FROM rows example from source

UPDATE: DB schema in text format

id INTEGER PRIMARY KEY, 
baseDomain TEXT, 
originAttributes TEXT NOT NULL DEFAULT '', 
name TEXT, 
value TEXT, 
host TEXT, 
path TEXT, 
expiry INTEGER, 
lastAccessed INTEGER, 
creationTime INTEGER, 
isSecure INTEGER, 
isHttpOnly INTEGER, 
inBrowserElement INTEGER DEFAULT 0, 
sameSite INTEGER DEFAULT 0, 
Domain TEXT, 
DurationDays INTEGER, 
WebCategory TEXT, 
CookieCategory TEXT, 
Action TEXT, 
TLD TEXT, 
CMP_VENDOR TEXT, 
CMP_IAB_V1 TEXT, 
CMP_IAB_V2 TEXT, 
Country TEXT, 
CookiesByDefault, 
CONSTRAINT moz_uniqueid UNIQUE (name, host, path, originAttributes));
CREATE INDEX moz_basedomain ON moz_cookies (baseDomain, originAttributes);

Also, if it's easier to do massive export of all the DBs to CSV would be also fine, instead of join all of them.

CrisoNikon
  • 17
  • 6
  • We'd need to see the schema. The complication is whether you need to preserve the primary keys and other per-database unique values. – Schwern May 10 '20 at 18:35
  • That's one. You mentioned two tables. – Schwern May 10 '20 at 18:58
  • @Schwern All sqlite files have two tables but I just need done of them called moz_cookies, which is common to all of them – CrisoNikon May 10 '20 at 19:01
  • Your tables don't have primary keys? – Shawn May 10 '20 at 22:44
  • 1
    And you should include the table definition as text (create table statement), not an image. – Shawn May 10 '20 at 22:46
  • @CrisoNikon Please post table definition as text, so that it will be possible to post a complete solution. – dash-o May 11 '20 at 06:19
  • @dash-o done, on the other hand if it's easier to do massive export of all the DBs to CSV would be also fine, instead of join all of them. Becasue I will exploit the data in CSV format. THKS! – CrisoNikon May 12 '20 at 07:40

2 Answers2

0

There's no way to preserve the id. If anything is depending on the id this won't work. You'll have to first add a globally unique id, change to using that, then migrate.

Simplest thing to do is to create a new database with an auto-incrementing primary key and import each database into that, skipping the ID. In SQLite, integer primary key is an alias for the rowid. That's sufficient to make it auto-increment. Normally you'd do this by dumping the table except for the ID column, but SQLite's built in dump and load facilities are anemic.

Instead we can write an application which connects to each database, loads each row skipping the ID, and inserts it into the master database. A new ID will be assigned. This isn't the fastest, but it should be fine for a one-off.

-- Recreate the database in your new table.
create table moz_cookies (
  id integer primary key, -- this will autoincrement
  ...and all the rest...
)

Then in a program loops through each file, selects each row, and inserts everything but the ID into the new master table.

Here's a sample program in Ruby using sqlite3-ruby.

require "sqlite3"
require 'pry'

# The file you're merging to.
MASTER_FILE = 'con_cookies.sqlite'.freeze

# The pattern to match the files you're pulling from.
DIR_GLOB = 'con_cookies_*.sqlite'.freeze

# All the columns except ID
COLS = ['baseDomain', 'originAttributes'].freeze

# A bind string for all the columns
BINDS = COLS.map { "?" }.join(",").freeze

# The table you're transferring.
TABLE = 'moz_cookies'.freeze

# Connect to the master database.
master = SQLite3::Database.new(MASTER_FILE)

# Prepare the insert statement.
# insert into moz_cookies ('baseDomain', 'originAttributes') values (?,?)
master.prepare("insert into #{TABLE} (#{COLS.join(",")}) values (#{BINDS})") do |insert_stmt|
  # Find each database file
  Dir.glob(DIR_GLOB) do |file|
    puts "Connecting to #{file}"

    # Connect to it
    db = SQLite3::Database.new(file)

    # select baseDomain, originAttributes from moz_cookies
    db.prepare("select #{COLS.join(",")} from #{TABLE}") do |select_stmt|
      # Fetch each row...
      rows = select_stmt.execute
      rows.each do |row|
        # Insert into the master table. It will fill in a new ID.
        insert_stmt.execute(row)
      end
    end
  end
end

Note that performance of any query which needs to select by the TLD may suffer. If you do this a lot, you may need to make composite indexes which include TLD. Or you can migrate to a more powerful database which features table partitioning.

Schwern
  • 153,029
  • 25
  • 195
  • 336
0

As an alternative, consider bash-based solution. It iterate thru the files, and append the content of the input table to a new table called 'result' in a new database called 'new.sqlite'.

Invoke with bash SCRIPT con_cookies_*.sqlite

#! /bin/sh -xv
in_table=con_cookie
new_db=new.sqlite

# Start by copying data from first file into table 'result
rm -f $new_db
sqlite3 $new_db <<__SQL__
attach database '$1' as 'data'  ;
create table result as select * from $in_table ;
__SQL__

shift
for file ; do
    echo "Loading $file"
    sqlite3 $new_db <<__SQL__
    attach database '$file' as data ;
    insert into result select * from $in_table ;
__SQL__
done
dash-o
  • 13,723
  • 1
  • 10
  • 37