Dynamic Datasets and SQLAlchemy

Question

I am refactoring some old SQLite3 SQL statements in Python into SQLAlchemy. In our framework, we have the following SQL statements that takes in a dict with certain known keys and potentially any number of unexpected keys and values (depending what information was provided).

import sqlite3
import sys

def dict_factory(cursor, row):
    d = {}
    for idx, col in enumerate(cursor.description):
        d[col[0]] = row[idx]
    return d


def Create_DB(db):
    #    Delete the database
    from os import remove
    remove(db)

#   Recreate it and format it as needed
    with sqlite3.connect(db) as conn:
        conn.row_factory = dict_factory
        conn.text_factory = str

        cursor = conn.cursor()

        cursor.execute("CREATE TABLE [Listings] ([ID] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL UNIQUE, [timestamp] REAL NOT NULL DEFAULT(( datetime ( 'now' , 'localtime' ) )), [make] VARCHAR, [model] VARCHAR, [year] INTEGER);")


def Add_Record(db, data):
    with sqlite3.connect(db) as conn:
        conn.row_factory = dict_factory
        conn.text_factory = str

        cursor = conn.cursor()

        #get column names already in table
        cursor.execute("SELECT * FROM 'Listings'")
        col_names = list(map(lambda x: x[0], cursor.description))

        #check if column doesn't exist in table, then add it
        for i in data.keys():
            if i not in col_names:
                cursor.execute("ALTER TABLE 'Listings' ADD COLUMN '{col}' {type}".format(col=i, type='INT' if type(data[i]) is int else 'VARCHAR'))

        #Insert record into table
        cursor.execute("INSERT INTO Listings({cols}) VALUES({vals});".format(cols = str(data.keys()).strip('[]'), 
                    vals=str([data[i] for i in data]).strip('[]')
                    ))

#Database filename
db = 'test.db'

Create_DB(db)

data = {'make': 'Chevy',
    'model' : 'Corvette',
    'year' : 1964,
    'price' : 50000,
    'color' : 'blue',
    'doors' : 2}
Add_Record(db, data)

data = {'make': 'Chevy',
    'model' : 'Camaro',
    'year' : 1967,
    'price' : 62500,
    'condition' : 'excellent'}
Add_Record(db, data)

This level of dynamicism is necessary because there's no way we can know what additional information will be provided, but, regardless, it's important that we store all information provided to us. This has never been a problem because in our framework, as we've never expected an unwieldy number of columns in our tables.

While the above code works, it's obvious that it's not a clean implementation and thus why I'm trying to refactor it into SQLAlchemy's cleaner, more robust ORM paradigm. I started going through SQLAlchemy's official tutorials and various examples and have arrived at the following code:

from sqlalchemy import Column, String, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

Base = declarative_base()

class Listing(Base):
    __tablename__ = 'Listings'
    id = Column(Integer, primary_key=True)
    make = Column(String)
    model = Column(String)
    year = Column(Integer)

engine = create_engine('sqlite:///')

session = sessionmaker()
session.configure(bind=engine)
Base.metadata.create_all(engine)

data = {'make':'Chevy',
    'model' : 'Corvette',
    'year' : 1964}

record = Listing(**data)

s = session()
s.add(record)
s.commit()
s.close()

and it works beautifully with that data dict. Now, when I add a new keyword, such as

data = {'make':'Chevy',
'model' : 'Corvette',
'year' : 1964,
'price' : 50000}

I get a TypeError: 'price' is an invalid keyword argument for Listing error. To try and solve the issue, I modified the class to be dynamic, too:

class Listing(Base):
    __tablename__ = 'Listings'
    id = Column(Integer, primary_key=True)
    make = Column(String)
    model = Column(String)
    year = Column(Integer)

    def __checker__(self, data):
        for i in data.keys():
            if i not in [a for a in dir(self) if not a.startswith('__')]:
                if type(i) is int:
                    setattr(self, i, Column(Integer))
                else:
                    setattr(self, i, Column(String))
            else:
                self[i] = data[i]

But I quickly realized this would not work at all for several reasons, e.g. the class was already initialized, the data dict cannot be fed into the class without reinitializing it, it's a hack more than anything, et al.). The more I think about it, the less obvious the solution using SQLAlchemy seems to me. So, my main question is, how do I implement this level of dynamicism using SQLAlchemy?

I've researched a bit to see if anyone has a similar issue. The closest I've found was Dynamic Class Creation in SQLAlchemy but it only talks about the constant attributes ("tablename" et al.). I believe the unanswered https://stackoverflow.com/questions/29105206/sqlalchemy-dynamic-attribute-change may be asking the same question. While Python is not my forte, I consider myself a highly skilled programmer (C++ and JavaScript are my strongest languages) in the context scientific/engineering applications, so I may not hitting the correct Python-specific keywords in my searches.

I welcome any and all help.

you should really be using something like mongodb for this ... sql is not meant for weird random column entries after table creation ... stuff like mongodb or redis are ... — Joran Beasley, Jul 09 '15 at 18:56
I'm not against using a different databasing system if it meets my needs: portable and easy to implement -- portable because I cannot install a server (neither local nor networked) on our client's machines, so that ruled out MySQL and other server-based DBs; a simple Python package or two is fine. This is why we went with SQLite. We also don't expect to enter too many columns after a few column (~5-10) insertions. We also don't need to flex crazy RDBMS muscle apart from 3-5 linked tables, so we don't really need a big data oriented DB. Does that help clarify why I went with SQLite? — Manuel J. Diaz, Jul 09 '15 at 19:19

Joran Beasley · Accepted Answer · 2015-07-09T19:36:53.193

class Listing(Base):
    __tablename__ = 'Listings'
    id = Column(Integer, primary_key=True)
    make = Column(String)
    model = Column(String)
    year = Column(Integer)
    def __init__(self,**kwargs):
       for k,v in kwargs.items():
           if hasattr(self,k):
              setattr(self,k,v)
           else:
              engine.execute("ALTER TABLE %s AD COLUMN %s"%(self.__tablename__,k)
              setattr(self.__class__,Column(k, String))
              setattr(self,k,v)

might work ... maybe ... I am not entirely sure I did not test it

a better solution would be to use a relational table

class Attribs(Base):
    listing_id = Column(Integer,ForeignKey("Listing"))
    name = Column(String)
    val = Column(String)

class Listing(Base):
    id = Column(Integer,primary_key = True)
    attributes = relationship("Attribs",backref="listing")
    def __init__(self,**kwargs):
        for k,v in kwargs.items():
            Attribs(listing_id=self.id,name=k,value=v)
    def __str__(self):
        return "\n".join(["A LISTING",] + ["%s:%s"%(a.name,a.val) for a in self.attribs])

another solution would be to store json

class Listing(Base):
    __tablename__ = 'Listings'
    id = Column(Integer, primary_key=True)
    data = Column(String)
    def __init__(self,**kwargs):
       self.data = json.dumps(kwargs)
       self.data_dict = kwargs

the best solution would be to use a no-sql key,value store (maybe even just a simple json file? or perhaps shelve? or even pickle I guess)

Woah! Awesome, @joran. It looks like I wasn't too far off indeed, but that double `setattr` seems to do the trick and worked perfectly (I just needed to modify the first setattr in the else clause as `setattr(self.__class__, k, Column(k, String))` ). I will look into the relational snippet you provided, too. Thanks again! — Manuel J. Diaz, Jul 09 '15 at 20:11

Dynamic Datasets and SQLAlchemy

1 Answers1

Linked