1

I would like to query a SQLite database, downloaded from a remote sever, in Python, without having ever written it to disk.

I see you can start a blank SQLite database in memory using the magic name :memory: https://docs.python.org/3/library/sqlite3.html. And I see you can copy from a disk-backed SQLite DB to an in-memory one, say using iterdump. But... I would like to populate a :memory: database with the contents of a bytes instance that contains the database, without ever having written those bytes to disk fist.

My reason: it seems unnecessary to have it all in memory, have to write it to disk, and then read it back to memory, so it's a general "keeping the number of steps down" in my process.

Michal Charemza
  • 25,940
  • 14
  • 98
  • 165
  • It would be neat to be able to do it with [mmap](https://docs.python.org/3/library/mmap.html), but `sqlite3.connect` only accepts path-like objects, not file-likes. Maybe there's a way to do it at the C level? – snakecharmerb Jun 20 '21 at 09:11
  • [This answer](https://stackoverflow.com/a/47745836/5320906) points out that sqlite has a native mmap capability. – snakecharmerb Jun 20 '21 at 09:32
  • 1
    @snakecharmerb I'm not that familiar with mmap, but doesn't it need the file to be written to disk first? – Michal Charemza Jun 20 '21 at 09:42
  • 1
    Ah yes, I thought you could pass it bytes, but apparently not. – snakecharmerb Jun 20 '21 at 09:48
  • Sqlite has a serialization API that does what you want, but the python sqlite library is very minimalist and doesn't provide access to it. – Shawn Jun 20 '21 at 11:07
  • 1
    @snakecharmerb Indeed there is a way in C: https://www.sqlite.org/c3ref/serialize.html and https://www.sqlite.org/c3ref/deserialize.html – Shawn Jun 20 '21 at 11:10

1 Answers1

0

It doesn't support queries, but you can use https://github.com/uktrade/stream-sqlite to access the contents of a downloaded/downloading SQLite file without having it written to disk.

Taking the example from its README:

from stream_sqlite import stream_sqlite
import httpx

def sqlite_bytes():
    with httpx.stream('GET', 'http://www.parlgov.org/static/stable/2020/parlgov-stable.db') as r:
        yield from r.iter_bytes(chunk_size=65_536)

for table_name, pragma_table_info, rows in stream_sqlite(sqlite_bytes(), max_buffer_size=1_048_576):
    for row in rows:
        print(row)

(Full disclosure: I was heavily involved in the development of stream-sqlite)

Michal Charemza
  • 25,940
  • 14
  • 98
  • 165