1

The DELETE query is simple:

DELETE FROM pages WHERE status = 0

It takes around 15 minutes to complete (for removing ~20K rows). It's a ~500 MB database that maps a local files system, and contains around 3 million records.

The structure:

  • pages - just a few records
  • files - around 230K records, contains a foreign key constraint with ON DELETE CASCADE that references a column from pages
  • meta - around 3 million records, contains foreign key constraints with ON DELETE CASCADE that reference columns from files and pages
  • search - a FTS4 table, almost exact duplicate of meta. The integrity of this table is maintained with triggers.
CREATE TABLE pages(
  id       INTEGER PRIMARY KEY AUTOINCREMENT,
  slug     TEXT,                           
  name     TEXT NOT NULL,
  type     INTEGER NOT NULL DEFAULT 1,     
  data     TEXT,
  parent   INTEGER,                        
  status   INTEGER DEFAULT 1,              
  comments INTEGER DEFAULT 1,              
  priority INTEGER DEFAULT 0,              
  UNIQUE(slug),
  FOREIGN KEY(parent) REFERENCES pages(id) ON DELETE CASCADE   
);

CREATE INDEX "pageParent" ON "pages"("parent");


CREATE TABLE files(
  id        INTEGER PRIMARY KEY AUTOINCREMENT,
  gallery   INTEGER NOT NULL,                      
  type      INTEGER NOT NULL DEFAULT 1,            
  sPath     TEXT,                                  
  rPath     TEXT,                                  
  parent    INTEGER,  
  hero      INTEGER,
  hidden    INTEGER DEFAULT 0,                     
  createdAt DATETIME,                              
  mTime     TEXT,                                  
  UNIQUE(sPath),
  FOREIGN KEY(gallery) REFERENCES pages(id) ON DELETE CASCADE,
  FOREIGN KEY(parent)  REFERENCES files(id) ON DELETE CASCADE,
  FOREIGN KEY(hero)    REFERENCES files(id) ON DELETE SET NULL
);

CREATE INDEX "fileGallery" ON "files"("gallery");
CREATE INDEX "fileType"    ON "files"("type");
CREATE INDEX "fileParent"  ON "files"("parent");
CREATE INDEX "fileRPathNS" ON "files"("rPath" COLLATE NATSORT);


CREATE TABLE thumbs(          
  hash    TEXT,  
  image   INTEGER,
  width   INTEGER,
  height  INTEGER,            
  FOREIGN KEY(image) REFERENCES files(id) ON DELETE CASCADE,
  PRIMARY KEY(hash, image) ON CONFLICT REPLACE
);

CREATE INDEX "thumbImage" ON "thumbs"("image");


CREATE TABLE meta(
  id      INTEGER PRIMARY KEY AUTOINCREMENT,
  file    INTEGER NOT NULL,
  key     TEXT NOT NULL,
  value   TEXT,         
  extra   TEXT,         
  gallery INTEGER,
  FOREIGN KEY(gallery) REFERENCES pages(id) ON DELETE CASCADE,  
  FOREIGN KEY(file) REFERENCES files(id) ON DELETE CASCADE
);

CREATE INDEX "metaFileId" ON "meta"("file"); 
CREATE INDEX "metaKey"    ON "meta"("key"); 
CREATE INDEX "metaExtra"  ON "meta"("extra"); 


CREATE VIRTUAL TABLE search USING fts4(file, key, value, gallery);

CREATE TRIGGER metaBeforeUpd BEFORE UPDATE ON meta BEGIN
  DELETE FROM search WHERE docid = OLD.rowid;
END;

CREATE TRIGGER metaBeforeDel BEFORE DELETE ON meta BEGIN
  DELETE FROM search WHERE docid = OLD.rowid;
END;

CREATE TRIGGER metaAfterUpd AFTER UPDATE ON meta BEGIN
  INSERT INTO search(docid, file, key, value, gallery) VALUES(NEW.rowid, NEW.file, NEW.key, NEW.value, NEW.gallery);
END;

CREATE TRIGGER metaAfterIns AFTER INSERT ON meta BEGIN
  INSERT INTO search(docid, file, key, value, gallery) VALUES(NEW.rowid, NEW.file, NEW.key, NEW.value, NEW.gallery);
END;

The problem is that not only it's slow, but it also locks these tables so I cannot do any changes to them during this time.

I tried every suggestion from this question and answers but without significant improvements. Setting journaling mode to MEMORY and turning off sync made it run a little faster, but it's too risky.

To avoid long write locks, I tried deleting records step by step, 40 at time with a 0.5s delay between. But this slows down the entire process even by 10x

Is there any other way I can improve the speed and/or avoid locks?


PS: What baffles me is that INSERTs are much faster. It takes 2 minutes to insert the amount of records I was deleting, and that time includes some heavy file processing (Exif reading from lots of images). Why is removing records slower than inserting?

Community
  • 1
  • 1
nice ass
  • 16,471
  • 7
  • 50
  • 89
  • suggestion, use soft deletes rather than hard deletes. Downside is code changes and maybe increased magnitude of storage. – Rippo Mar 29 '14 at 07:05
  • Database schema (including indexes)? – CL. Mar 29 '14 at 08:37
  • @CL: [the schema](http://sqlfiddle.com/#!5/39c70), but I had to comment out an index and the fts table because it doesn't seem to work on sqlfiddle. By soft deletes, do you mean updating the records with a value that suggests the record should not be taken into account in SELECT queries ? This is what I'm doing, but I still want to remove dead records in a background process. The problem is that the process locks some tables and is taking too long for what it does – nice ass Mar 29 '14 at 09:27

1 Answers1

2

Deleting from pages is slow because there is no index on the gallery column in the meta table. Whenever a pages record is actually deleted, the database has to search for any meta records that match the ON DELETE CASCADE constraint; this results in a full table scan for every deleted record.

(The INSERTs are faster because no such check must be made.)

SQLite is not designed for concurrency; it is never possible to have multiple writers at the same time. However, to allow multiple readers at the same time as a writer, consider enabling write-ahead logging.

CL.
  • 173,858
  • 17
  • 217
  • 259
  • I thought it would use the `metaFileId` index. Anyway, I did an "explain query plan" before I added an index for the gallery column and it was indeed doing a SCAN on meta. Now EXPLAIN QUERY PLAN tells me that is using the new index, but the query is just as slow as before :( – nice ass Mar 29 '14 at 17:55
  • ok the issue was a missing index for the `hero` column, which also has a constraint :) Adding an index to the `meta.gallery` column didn't make much difference because it only does a single scan per deleted page record, that's like +2 seconds for my db, I can live with that. But you were right, thanks for the help! I wish EXPLAIN QUERY PLAIN would give more info, like search/scan operations on cascading deletes... – nice ass Mar 29 '14 at 22:09