1

How to save (and also restore, and add elements to) a set of strings in a Sqlite3 database?

This does not work because sets are not JSON-serializable:

import sqlite3, json
db = sqlite3.connect(':memory:')
db.execute('CREATE TABLE t(id TEXT, myset TEXT);')
s = {'a', 'b', 'c'}
db.execute("INSERT INTO t VALUES (?, ?);", ('1', json.dumps(s)))  
# Error: Object of type set is not JSON serializable

so we can use a list, or a dict with dummy values:

s = list(s)
# or s = {'a':0, 'b':0, 'c': 0}
db.execute("INSERT INTO t VALUES (?, ?);", ('1', json.dumps(s)))  

# RETRIEVE A SET FROM DB
r = db.execute("SELECT myset FROM t WHERE id = '1'").fetchone()
if r is not None:
    s = set(json.loads(r[0]))
    print(s)

Then adding a string element to a set already in the DB is not very elegant:

  • one has to SELECT,
  • retrieve as string,
  • parse the JSON with json.loads,
  • convert from list to set,
  • add an element to the set,
  • convert from set to list (or, as an alternative for these 3 last steps: check if the element is already present in the list, and add it or not to the list)
  • JSONify it with json.dumps,
  • database UPDATE

Is there a more pythonic way to work with sets in a Sqlite database?

Basj
  • 41,386
  • 99
  • 383
  • 673
  • 1
    You could store each set item together with its id as separate row in a DB table. With a unique index you can ensure that the same item can't be added twice. – Michael Butscher Mar 02 '22 at 12:12
  • I think you can refer to https://stackoverflow.com/questions/8230315/how-to-json-serialize-sets – gftea Mar 17 '22 at 12:38
  • @gftea Thanks but the main answer from this post is basically about converting a set `s` to a list with `list(s)` before JSONifying it, which is what I already do in my original post. – Basj Mar 17 '22 at 13:07
  • yes, but the post have answered you have to convert to list first. – gftea Mar 17 '22 at 13:16
  • How about YAML, as many of the comments in that question suggest? – aaron Mar 17 '22 at 13:20

4 Answers4

1

You can register adapter and converter functions with sqlite that will automatically perform the desired conversions.

import json
import sqlite3

def adapt_set(value):
    return json.dumps(list(value))

def convert_set(value):
    return set(json.loads(value))

sqlite3.register_adapter(set, adapt_set)
sqlite3.register_converter('set_type', convert_set)

Once these functions have been registered, pass detect_types to the connection factory to tell sqlite how to use them.

Passing sqlite3.PARSE_DECLTYPE will make the connection use the declared type to look up the adapter/converter.

db = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_DECLTYPES)

# Declare the myset column as type "set_type". 
db.execute('CREATE TABLE t(id TEXT, myset set_type);')

db.execute("INSERT INTO t VALUES (?, ?);", ('1', {1, 2, 3})) 
r = db.execute("""SELECT myset FROM t WHERE id = '1'""").fetchone()
print(r[0])   # <- r[0] is a set.

Passing sqlite.PARSE_COLNAMES will cause the column name in the cursor description to be searched for the type name enclosed in square brackets.

db = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_COLNAMES)
# The type is not declared in the created table statement.
db.execute('CREATE TABLE t(id TEXT, myset TEXT);')
db.execute("INSERT INTO t VALUES (?, ?);", ('1', {1, 2, 3})) 

# Include the type in the column label.
r = db.execute("""SELECT myset "AS myset [set_type]" FROM t WHERE id = '1'""").fetchone()
print(r[0])   <- r[0] is a set
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • Thanks! It will still perform set <-> list <-> json conversion internally (which is not optimal in terms of performance), but at least it will be done internally, so we don't need to do it manually after each query, good idea! – Basj Mar 19 '22 at 15:02
  • @Basj For a simple set of strings, all `json.dumps(list(value))` effectively does is replace the curly brackets in the string representation of the set to square brackets (and vice-versa for `set(json.loads(value))` if duplicates are already removed). Have you considered the solutions in [my answer](https://stackoverflow.com/a/71530117/9504155)? Neither of them requires those conversions, which should be better for performance. – Jonathan Feenstra Mar 19 '22 at 16:59
  • Well, the question was for Pythonicness and elegance, not performance :). The representation is an implementation detail; the list json set dance can be swapped out for pickle.loads/dumps as suggested in [Jonathan](https://stackoverflow.com/users/9504155/jonathan-feenstra)'s [answer](https://stackoverflow.com/a/71530117/5320906), or any other representation scheme (that works). – snakecharmerb Mar 19 '22 at 17:26
  • Adapters and converters are certainly more elegant than repeating the conversion code every time. I personally still prefer my second solution though, since it keeps the database [1NF](https://en.wikipedia.org/wiki/First_normal_form) and requires no (de)serialisations/conversions whatsoever (except for the set conversion when retrieving the data). – Jonathan Feenstra Mar 20 '22 at 08:47
1

I would register a "set adapter" that converts a set into a byte string by simply taking the string representation of the set and encoding it into a bytes string for storage and a "set converter" that converts our user-defined column type named "set_type" (pick any alternate you wish) from a byte string back into a set by decoding the byte string back into a Unicode string and then applying eval against it:

import sqlite3
from decimal import Decimal


db = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)

def set_adapter(the_set):
    return str(the_set).encode('utf-8')

def set_converter(s):
    return eval(s.decode('utf-8'))

sqlite3.register_adapter(set, set_adapter)
sqlite3.register_converter('set_type', set_converter)


# Define the columns with type set_type:
db.execute('CREATE TABLE t(id TEXT, myset set_type);')
s = {'a', 'b', 'c', (1, 2, 3), True, False, None, b'abcde', Decimal('12.345')}
# Our adapter will store s as a byte string:
db.execute("INSERT INTO t VALUES (?, ?);", ('1', s))
cursor = db.cursor()
# Our converter will convert column type set_type from bytes to set:
cursor.execute('select myset from t')
row = cursor.fetchone()
s = row[0]
print(type(s), s)
db.close()

Prints:

<class 'set'> {False, True, 'b', None, (1, 2, 3), Decimal('12.345'), b'abcde', 'a', 'c'}

As you can see, this also handles more datatypes than JSON can after converting a set to a list. JSON could potentially handle as many data types but only if you write JSON converters (which usually means converting those data types to one of the supported types such as strings).

Defining the adapter and converter to use pickle as in the answer offered by Jonathan Feenstra will result in a speed improvement but possibly use more storage. But I would use the technique outlined above, i.e. using adapter and converter functions with a special user-defined column type:

import pickle

def set_adapter(the_set):
    return pickle.dumps(the_set, pickle.HIGHEST_PROTOCOL)

def set_converter(s):
    return pickle.loads(s)
Booboo
  • 38,656
  • 3
  • 37
  • 60
0

A simpler way to store the set in the SQLite database is to use the BLOB datatype for the myset column and serialise it to bytes using pickle.dumps:

import sqlite3
import pickle

db = sqlite3.connect(":memory:")
db.execute("CREATE TABLE t(id TEXT, myset BLOB)")

s = {"a", "b", "c"}
db.execute(
    "INSERT INTO t VALUES (?, ?)",
    ("1", sqlite3.Binary(pickle.dumps(s, pickle.HIGHEST_PROTOCOL))),
)

r = db.execute("SELECT myset FROM t WHERE id = '1'").fetchone()
if r is not None:
    s = pickle.loads(r[0])
    print(s)

To add new elements to a set in the database, the serialisation and deserialisation steps are still required, but no more conversion to/from a list or checking for elements that are already present in the set.

Alternatively, you could ensure the uniqueness of the ID-element combination at database-level, for example using a composite primary key:

import sqlite3

db = sqlite3.connect(":memory:")
db.execute("CREATE TABLE t(id TEXT, element TEXT, PRIMARY KEY(id, element))")

s = {"a", "b", "c"}
for element in s:
    db.execute("INSERT INTO t VALUES(?, ?)", ("1", element))

r = db.execute("SELECT element FROM t WHERE id = '1'").fetchall()
s = set(row[0] for row in r)
print(s)
Jonathan Feenstra
  • 2,534
  • 1
  • 15
  • 22
0

You can use aiosqlitedict

Here is what it can do

  1. Easy conversion between sqlite table and Python dictionary and vice-versa.
  2. Get values of a certain column in a Python list.
  3. Order your list ascending or descending.
  4. Insert any number of columns to your dict.

Getting Started

We start by connecting our database along with the reference column

from aiosqlitedict.database import Connect

countriesDB = Connect("database.db", "user_id")

Make a dictionary

The dictionary should be inside an async function.

async def some_func():
    countries_data = await countriesDB.to_dict("my_table_name", 123, "col1_name", "col2_name", ...)

You can insert any number of columns, or you can get all by specifying the column name as '*'

    countries_data = await countriesDB.to_dict("my_table_name", 123, "*")

so you now have made some changes to your dictionary and want to export it to sql format again?

Convert dict to sqlite table

async def some_func():
    ...
    await countriesDB.to_sql("my_table_name", 123, countries_data)

But what if you want a list of values for a specific column?

Select method

you can have a list of all values of a certain column.

country_names = await countriesDB.select("my_table_name", "col1_name")

to limit your selection use limit parameter.

country_names = await countriesDB.select("my_table_name", "col1_name", limit=10)

you can also arrange your list by using ascending parameter and/or order_by parameter and specifying a certain column to order your list accordingly.

country_names = await countriesDB.select("my_table_name", "col1_name", order_by="col2_name", ascending=False)
Abdo Sabry
  • 82
  • 7