18

Python's sqlite3 :memory: option provides speedier queries and updates than the equivalent on-disk database. How can I load a disk-based database into memory, do fast operations on it, and then write the updated version back to disk?

The question How to browse an in memory sqlite database in python seems related but it focuses on how to use a disk-based browsing tool on an in-memory db. The question How can I copy an in-memory SQLite database to another in-memory SQLite database in Python? is also related but it is specific to Django.

My current solution is to read all of the tables, one-at-a-time, from the disk-based database into lists of tuples, then manually recreate the entire database schema for the in-memory db, and then load the data from the lists of tuples into the in-memory db. After operating on the data, the process is reversed.

There must be a better way!

Community
  • 1
  • 1
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
  • 1
    http://stackoverflow.com/questions/3850022/python-sqlite3-load-existing-db-file-to-memory looks like specifically your case. – pavel_form Jul 03 '15 at 10:30
  • http://stackoverflow.com/questions/9759589/get-sqlite3-database-from-disk-load-it-to-memory-and-save-it-to-disk || http://stackoverflow.com/questions/28167991/i-wolud-like-to-copy-sqlite3-db-from-memory-to-hard-drive-how-can-i-do?lq=1 – Ciro Santilli OurBigBook.com Nov 20 '15 at 21:38

2 Answers2

15

The answer at How to load existing db file to memory in Python sqlite3? provided the important clues. Building on that answer, here is a simplification and generalization of that code.

It eliminates eliminate the unnecessary use of StringIO and is packaged into reusable form for both reading into and writing from an in-memory database.

import sqlite3

def copy_database(source_connection, dest_dbname=':memory:'):
    '''Return a connection to a new copy of an existing database.                        
       Raises an sqlite3.OperationalError if the destination already exists.             
    '''
    script = ''.join(source_connection.iterdump())
    dest_conn = sqlite3.connect(dest_dbname)
    dest_conn.executescript(script)
    return dest_conn

if __name__ == '__main__':
    from contextlib import closing

    with closing(sqlite3.connect('pepsearch.db')) as disk_db:
        mem_db = copy_database(disk_db)

    mem_db.execute('DELETE FROM documents WHERE uri="pep-3154"')
    mem_db.commit()

    copy_database(mem_db, 'changed.db').close()
Community
  • 1
  • 1
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
5

Frankly, I wouldn't fool around too much with in-memory databases, unless you really do need an indexed structure that you know will always fit entirely within available memory. SQLite is extremely smart about its I/O, especially when you wrap everything (including reads ...) into transactions, as you should. It will very efficiently keep things in memory as it is manipulating data structures that fundamentally live on external storage, and yet it will never exhaust memory (nor, take too much of it). I think that RAM really does work better as "a buffer" instead of being the primary place where data is stored ... especially in a virtual storage environment, where everything must be considered as "backed by external storage anyway."

Mike Robinson
  • 8,490
  • 5
  • 28
  • 41
  • 2
    My use case is a long running process (a web-service) with the data completely in RAM (so there is no disk i/o at all during operation). The data is rarely mutated, but I do run frequent queries. The on-disk version is around only so that the data survives a system restart. – Raymond Hettinger Jul 03 '15 at 19:22