0

I'm trying to create a large SQLite database from around 500 smaller databases (each 50-200MB) to put into Django, and would like to speed up this process. I'm doing this via a custom command.

This answer helped me a lot, in reducing the speed to around a minute each in processing a smaller database. However it's still quite a long time.

The one thing I haven't done in that answer is to disable database indexing in Django and re-create them. I think this matters for me as my database has few tables with many rows.

Is there a way to do that in Django when it's running live? If not in Django then perhaps there's some SQLite query to remove all the indexes and re-create them after I insert my records?

simonzack
  • 19,729
  • 13
  • 73
  • 118

1 Answers1

1

I just used raw SQL to remove the indexes and re-create them. This improved the speed of creating a big database from 2 of my small databases from 1:46 to 1:30, so quite significant. It also reduced the size from 341.7MB to 321.1MB.

# Delete all indexes for faster database creation
with connection.cursor() as cursor:
    cursor.execute(f'SELECT name, sql FROM sqlite_master WHERE name LIKE "{app_label}_%" AND type == "index"')
    indexes = cursor.fetchall()
    names, create_sqls = zip(*indexes)
    for name in names:
        cursor.execute(f'DROP INDEX {name}')

After I create the databases re-create the index:

# Re-create indexes
with connection.cursor() as cursor:
    for create_sql in create_sqls:
        cursor.execute(create_sql)
simonzack
  • 19,729
  • 13
  • 73
  • 118
  • Hi @simonzack. Did you mean that your insert time went down from 1 hr 46 min to 1 hr 30 min? That does not seem like a huge improvement (although your requirements may have been different). Do you mid sharing what the approximate improvement was using methods 1 and 2 from the answer you linked to? Thank you, – PiperWarrior Mar 15 '21 at 03:07
  • Well I think what I meant was it goes down from 1 min 46 sec to 1 min 30 sec. So if you multiply that by 500 / 2 (for 500 small databases), that's 7:21:40 to 6:15:00, so saves 1 hour. For the answer I linked to, from my past info, `PRAGMA journal_mode = MEMORY` had the time go down from 2 min 58 sec to 2 min 34 sec. Doing `PRAGMA cache_size = 100000` speeds it up to 1 min 51 sec. Using a database in memory instead of a temporary table speeds it up to 1 min 46 sec. – simonzack Mar 15 '21 at 05:41