13

Is there a built-in way in SQLite (or similar) to keep the best of both worlds SQL / NoSQL, for small projects, i.e.:

  • stored in a (flat) file like SQLite (no client/server scheme, no server to install; more precisely : nothing else to install except pip install <package>)
  • possibility to store rows as dict, without having a common structure for each row, like NoSQL databases
  • support of simple queries

Example:

db = NoSQLite('test.db')
db.addrow({'name': 'john doe', 'balance': 1000, 'data': [1, 73.23, 18]})
db.addrow({'name': 'alice', 'balance': 2000, 'email': 'a@b.com'})
for row in db.find('balance > 1500'):
    print(row)

# {'id': 'f565a9fd3a', 'name': 'alice', 'balance': 2000, 'email': 'a@b.com'}   # id was auto-generated

Note: I have constantly been amazed along the years by how many interesting features are in fact possible with SQLite in a few lines of code, that's why I'm asking if what I describe here could maybe be available simply with SQLite by using only a few SQLite core features.

PS: shelve could look like a solution but in fact it's just a persistent key/value store, and it doesn't have query/find functions; also bsddb (BerkeleyDB for Python) looks deprecated and has no query feature with a similar API.

Basj
  • 41,386
  • 99
  • 383
  • 673

2 Answers2

9

SQLite

  • JSON1 extension and json_extract (see accepted answer). Example:

    import sqlite3, json  # tested with precompiled Windows binaries from https://www.sqlite.org/download.html (sqlite3.dll copied in C:\Python37\DLLs)
    
    class sqlitenosql:
        def __init__(self, f):
            self.db = sqlite3.connect(f)
            self.db.execute('CREATE TABLE test(data TEXT);')
    
        def close(self):
            self.db.commit()
            self.db.close()
    
        def addrow(self, d):
            self.db.execute("INSERT INTO test VALUES (?);", (json.dumps(d),))
    
        def find(self, query):
            for k, v in query.items():
                if isinstance(v, str):
                    query[k] = f"'{v}'"
            q = ' AND '.join(f" json_extract(data, '$.{k}') = {v}" for k, v in query.items())
            for r in self.db.execute(f"SELECT * FROM test WHERE {q}"):
                yield r[0]
    
    db = sqlitenosql(':memory:')
    db.addrow({'name': 'john', 'balance': 1000, 'data': [1, 73.23, 18], 'abc': 'hello'})
    db.addrow({'name': 'alice', 'balance': 2000, 'email': 'a@b.com'})
    db.addrow({'name': 'bob', 'balance': 1000})
    db.addrow({'name': 'richard', 'balance': 1000, 'abc': 'hello'})
    for r in db.find({'balance': 1000, 'abc': 'hello'}):
        print(r)
    # {"name": "john", "balance": 1000, "data": [1, 73.23, 18], "abc": "hello"}
    # {"name": "richard", "balance": 1000, "abc": "hello"}    
    db.close()
    
  • sqlitedict as mentioned in Key: value store in Python for possibly 100 GB of data, without client/server and Use SQLite as a key:value store with:

    key = an ID

    value = the dict we want to store, e.g. {'name': 'alice', 'balance': 2000, 'email': 'a@b.com'}

  • Further reading about use of SQLite with JSON: https://community.esri.com/groups/appstudio/blog/2018/08/21/working-with-json-in-sqlite-databases

TinyDB

TinyDB looks like a good solution:

>>> from tinydb import TinyDB, Query
>>> db = TinyDB('path/to/db.json')
>>> User = Query()
>>> db.insert({'name': 'John', 'age': 22})
>>> db.search(User.name == 'John')
[{'name': 'John', 'age': 22}]

However, the documentation mentions that it's not the right tool if we need:

  • access from multiple processes or threads,
  • creating indexes for tables,
  • an HTTP server,
  • managing relationships between tables or similar,
  • ACID guarantees

So it's a half solution :)

Oher solutions

Seems interesting too : WhiteDB

Basj
  • 41,386
  • 99
  • 383
  • 673
  • 1
    Your answer helped me no end with the quick solution i needed. adapted your code and used jsonpickle to handle encoding which returns the class type on querying. As the question is closed i will post my solution as a gist here to help future devs out. https://gist.github.com/ahancock1/72e459e294875988f4e6138d348122de – Adam H Jun 24 '22 at 19:43
8

It's possible via using the JSON1 extension to query JSON data stored in a column, yes:

sqlite> CREATE TABLE test(data TEXT);
sqlite> INSERT INTO test VALUES ('{"name":"john doe","balance":1000,"data":[1,73.23,18]}');
sqlite> INSERT INTO test VALUES ('{"name":"alice","balance":2000,"email":"a@b.com"}');
sqlite> SELECT * FROM test WHERE json_extract(data, '$.balance') > 1500;
data
--------------------------------------------------
{"name":"alice","balance":2000,"email":"a@b.com"}

If you're going to be querying the same field a lot, you can make it more efficient by adding an index on the expression:

CREATE INDEX test_idx_balance ON test(json_extract(data, '$.balance'));

will use that index on the above query instead of scanning every single row.

Shawn
  • 47,241
  • 3
  • 26
  • 60
  • Wow, nice solution! Then I guess, it would loop over *the whole* DB to do `find('name="john doe"')`? I imagine it's like this because SQLite has to `json_extract` all rows to test if the name matches, is it right? – Basj Apr 08 '20 at 07:40
  • @Basj You *might* be able to do something with an [index on an expression](https://www.sqlite.org/expridx.html), but I haven't tried it. Otherwise, yes, it'd have to look at every row - note I didn't say doing this was a *good* idea, just that it can be done. :) – Shawn Apr 08 '20 at 13:34
  • Okay, an index on an expression will work with `json_extract`(). – Shawn Apr 08 '20 at 16:05
  • Thanks! Could you include an example of how the query should be done, after having created an index with `CREATE INDEX test_idx_balance ON test(json_extract(data, '$.balance'));`? – Basj Apr 11 '20 at 21:56