6

What I'm trying to implement here is a condition wherein a sqlite database holds only the most recent 1000 records. I have timestamps with each record. One of the inefficient logic which strikes right away is to check the total number of records. If they exceed 1000, then simply delete the ones which fall out of the periphery.

However, I would have to do this check with each INSERT which makes things highly inefficient.

What could be a better logic? Can we do something with triggers?

Some related questions which follow the same logic I thought of are posted on SO:-

Delete oldest records from database

SQL Query to delete records older than two years

Community
  • 1
  • 1
Rohan
  • 871
  • 1
  • 16
  • 32
  • 2
    Did you thought of the scenario when the table reached 1000 records on every insert you have to perform delete and eventualy count(someId) wich is not cheap. It is beter to leave records to grow and periodicaly to clean out old ones. – danisius Sep 07 '13 at 20:38
  • Along these same lines, a scheduled job that maintains a small table size might meet your requirements. Also consider moving records to an archive table rather than deleting them. You just never know when you might have cause to look at them. – Dan Bracuk Sep 07 '13 at 20:54
  • @danisius - the scenario you've proposed is definitely possible at all times because once the number of records reach 1000, there always will this situation with every insert. – Rohan Sep 08 '13 at 08:46
  • @DanBracuk I can definitely think of archiving and that's a good idea too, but I am struggling to find a way to do that in the database rather than through the application that connects to the DB. I'm using Flex, so that will make things worse if I tend to have timer based scheduled jobs trying to archive records. – Rohan Sep 08 '13 at 08:48

1 Answers1

10

You can use an implicit "rowid" column for that.

Assuming you don't delete rows manually in different ways:

DELETE FROM yourtable WHERE rowid < (last_row_id - 1000)

You can obtain last rowid using API function or as max(rowid)

If you don't need to have exactly 1000 records (e.g. just want to cleanup old records), it is not necessary to do it on each insert. Add some counter in your program and execute cleanup f.i. once every 100 inserts.

UPDATE:

Anyway, you pay performance either on each insert or on each select. So the choice depends on what you have more: INSERTs or SELECTs.

In case you don't have that much inserts to care about performance, you can use following trigger to keep not more than 1000 records:

CREATE TRIGGER triggername AFTER INSERT ON tablename BEGIN
     DELETE FROM tablename WHERE timestamp < (SELECT MIN(timestamp) FROM tablename ORDER BY timestamp DESC LIMIT 1000);
END

Creating unique index on timestamp column should be a good idea too (in case it isn't PK already). Also note, that SQLITE supports only FOR EACH ROW triggers, so when you bulk-insert many records it is worth to temporary disable the trigger.

If there are too many INSERTs, there isn't much you can do on database side. You can achieve less frequent trigger calls by adding trigger condition like AFTER INSERT WHEN NEW.rowid % 100 = 0. And with selects just use LIMIT 1000 (or create appropriate view).

I can't predict how much faster that would be. The best way would be just measure how much performance you will gain in your particular case.

nevermind
  • 2,300
  • 1
  • 20
  • 36
  • thanks for the suggestion. The problem right now is that the records are not inserted in order initially so I can't take rowid as a reference - have to go with timestamp. I'd need to have exactly 1000 records for a role of the over a million records. I'm just looking at ways to keep away from searching and looking up 1000 of 1-5 million records every time there is a search/lookup operation. – Rohan Sep 08 '13 at 08:52
  • @Rohan I've update the answer with possible solutions for this case – nevermind Sep 08 '13 at 12:05
  • @nevermind: Nice trigger, why not `... id IN (SELECT id FROM ... WHERE clause`)? I want to say something about `rowid`: its assignment behavior changes depending on whether the table has a primary [auto incremented] key, so don't trust blindly on it. Source: http://www.sqlite.org/autoinc.html. – Alix Axel Sep 09 '13 at 01:11
  • @Alix Axel `... id IN (SELECT id FROM ... WHERE clause` should also work fine. The only difference might be in performance, because query plan is different. I prefer to avoid `IN(...)` statements whenever possible. – nevermind Sep 09 '13 at 07:17
  • 1
    @Alix Axel Regarding `WHEN` statement: your notice is correct, this will work correctly only for AUTOINCREMENT PK. Generally, idea is to set trigger's `WHEN` condition to reduce number of trigger calls. This could also be smth. like `... AFTER INSERT WHEN RANDOM() % 100 = 0 ON ...`. But again, this is optimization step, that should be done only after you made sure that performance of the 'raw' trigger (without `WHEN` statement) is not sufficient for your task. – nevermind Sep 09 '13 at 07:20
  • Is there a way to set `timetamp` to now? – Eray Erdin Jul 03 '23 at 12:42