1

I have a very large sqlite database with a single table with two text columns (about 2.3 billion rows, 98GB) that I'm trying to create an index on using the sqlite3 cli tool on Ubuntu 20.04.

The command I'm trying to run is:

CREATE INDEX col1_col2_x ON tablename(col1 COLLATE NO CASE,col2);

The goal is to also create the opposite index to be able to do very fast case-insensitive searches on either column.

Every time I try, it runs for about an hour and then the process exits with just the message "Killed" and exit code 137, which I don't see listed in the sqlite3 documentation.

My first thought was running out of memory, so I tried setting the pragma temp_store_directory as well as the TEMP_DIR environment variable to same directory as the database file, which has about 8TB of free space, so I'm not sure what's going wrong.

Is sqlite not meant for databases of this size? Creating the index before insert doesn't seem to be a viable option as it's looking like it's going to take months. I should also note that I was able to create the exact same indexes successfully with a 36GB table that has the same schema so I'm wondering if I'm running into an undocumented limitation?

I'm also open to other database solutions if sqlite isn't the right solution, although preliminary tests of postgres didn't seem to be any better.

  • It might be worth checking the Linux logs in `/var/log/messages` or `/var/log/syslog` to see if it's Linux doing it (and why). https://stackoverflow.com/questions/726690/what-killed-my-process-and-why – GregHNZ May 31 '21 at 03:38
  • you might find this interesting https://stackoverflow.com/questions/30944355/bash-error-code-137-vs-1-when-out-of-memory#30963384. It appears to say that error code 137 is the OOM killer and hence killed. – MikeT May 31 '21 at 09:23

1 Answers1

0

Have you considered setting any of the various PRAGMA statements for the database?

Even with a 2Gb database of only 5 million rows the following were helpful.

PRAGMA page_size = 4096;
PRAGMA cache_size = 10000;
PRAGMA synchronous = OFF;
PRAGMA auto_vacuum = FULL;
PRAGMA automatic_index = FALSE;
PRAGMA journal_mode = OFF;

page_size

Query or set the page size of the database. The page size must be a power of two between 512 and 65536 inclusive.

cache_size

Query or change the suggested maximum number of database disk pages that SQLite will hold in memory at once per open database file.

synchronous

With synchronous OFF (0), SQLite continues without syncing as soon as it has handed data off to the operating system.

automatic_vaccum

When the auto-vacuum mode is 1 or "full", the freelist pages are moved to the end of the database file and the database file is truncated to remove the freelist pages at every transaction commit.

automatic_index

Set Automatic Indexes

journal_mode

The OFF journaling mode disables the rollback journal completely. No rollback journal is ever created and hence there is never a rollback journal to delete. The OFF journaling mode disables the atomic commit and rollback capabilities of SQLite.

Dave Anderson
  • 11,836
  • 3
  • 58
  • 79