0

I was wondering if it's possible to create an SQLite database where I can select rows by rowid with O(1).

I started using sqlite database in one of my projects and discovered that selecting rows from bigger databased takes longer than selecting rows from smaller databases. I started searching online and stumbled upon this article. Apparently, when selecting by rowid, instead of going straight to the rowid, SQLite performs a binary search to get to the requested rowid. This is a very logical solution, because we can delete rows from the database and in this case, going straight to the rowid won't work. But in my case - I have an "immutable" database, after creating the database I'm not changing it; Thus, all the rowid are present and in the correct order.

So I was wandering if it's possible to either create a special database or use a specific query command which tells SQLite to select by accessing the rowid without any binary search.

If there are other alternatives to SQLite that can perform better for my case please inform me about them (though, for in my project I can't load the db into memory and the access to different db's simultaneously should be instantaneous)

Thanks.

artembus
  • 427
  • 1
  • 6
  • 13
  • Have you tried a WITHOUT ROWID table? - https://www.sqlite.org/withoutrowid.html – GIS-Jonathan Dec 17 '18 at 10:59
  • "I have an "immutable" database, after creating the database I'm not changing it; "... Yes but how should the database (SQLite) know that and optimize that.. It's not only SQLite, MySQL also uses more [seeks](https://dev.mysql.com/doc/refman/8.0/en/estimating-performance.html) when data gets larger in de indexes (using B-tree).. "If there are other alternatives to SQLite that can perform better for my case" So it's pretty likely other RDBMS also work in that way when using B-tree indexes... – Raymond Nijland Dec 17 '18 at 11:49
  • "If there are other alternatives to SQLite that can perform better for my case" For what case you didn't provide table structures, example data and or a query you are running i advice you to read [this](https://meta.stackoverflow.com/questions/333952/why-should-i-provide-an-mcve-for-what-seems-to-me-to-be-a-very-simple-sql-query) and provide us with text formatted example data, a `CREATE TABLE` statement and text formatted expected results... Besides you should also tell us how many users are connected to the SQLite database and how the pyhon application works more or less – Raymond Nijland Dec 17 '18 at 11:54
  • @RaymondNijland, I chose SQLite based on the suggestions I got in this thread https://stackoverflow.com/questions/53483493/instant-access-to-line-from-a-large-file-without-loading-the-file I needed to get instant access to a line from a file without loading it. Now that I got a suggestion about dbm, I might investigate this option. – artembus Dec 17 '18 at 12:14
  • Sqlite rowid tables use B*-trees and thus have O(n) lookup times. You can't change this. – Shawn Dec 17 '18 at 15:54

2 Answers2

1

If you do not need the full power of SQLite, you could a simple hashing algorithm with the dbm module. It uses hashing and could perform better than an ISAM index. But you will lose ordering (among other features like SQL...)

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • All I need is a file with instant access to lines by line number without storing anything in memory... maybe I will check dbm. In my previous post I asked a similar question and by popular opinion SQLite won. – artembus Dec 17 '18 at 12:11
  • @artembus: SQLite is very fast and nicely interfaced from Python, so it is often a natural choice. But if you really need high performance, you could even considere a dedicated direct access file which will ensure O(1) access times... – Serge Ballesta Dec 17 '18 at 14:03
  • @artembus: after reading your previous post, I really think that what you need is to build an index file with the offset + length of each line. – Serge Ballesta Dec 17 '18 at 14:09
  • Thanks for the suggestions. I thought about creating an index file but I couldn't find any good documentation about that, probably because I searched the wrong terminology. Can you give me a few hints for what packages/frameworks/methods I should look into? I don't want to rebuild the wheel from 0 and I'm sure someone already created some code/repo I can use as a base. – artembus Dec 18 '18 at 16:21
  • @artembus: I tried to google for it but did not find anything. It is not a so common use case because databases are the standard. Maybe you could have a look at NoSQL solutions which use direct key/value hashing. Not as efficient as a direct line_number/line access, but O(1) anyway. – Serge Ballesta Dec 19 '18 at 08:13
0

I ended up using mmap. Because I had millions of lines of the same length I just saved those lines to a binary file with mmap. Then to access k line I simply asked mmap to read from k * (length_of_line) point.

I used the snippet code from the answer here to test the solution quickly, though I believe it can be optimized further than this simple code.

artembus
  • 427
  • 1
  • 6
  • 13