1

Backstory

I have a questionnaire that asks sensitive questions most of which are true/false. The majority of the time the values are false which poses a challenge when keeping the data private at rest. When encrypting each question into a separate column, it is really easy to tell which value is true and which is false with a bit of guessing. To combat this, the questions and answers are put into a dictionary object with some salt (nonsense that changes randomly) then encrypted. Making it impossible without the key to know what the answers were.

Method

Below is an example of the model used to encrypt the data with salt at rest making it impossible to look at the data and know the contents.

import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy_utils.types import JSONType
from sqlalchemy_utils.types.encrypted.encrypted_type import StringEncryptedType, AesEngine


Base = declarative_base()

class SensitiveQuestionnaire(Base):
    user_id = sa.Column(sa.Integer, primary_key=True, autoincrement=True)
    _data = data: dict = sa.Column(StringEncryptedType(JSONType, 'secret', AesEngine, 'pkcs5'),
        nullable=False, default=lambda: {'_salt': salt_shaker()})

    # values are viewed using a python property to look into the `_data` dict
    @property
    def sensitive_question(self) -> Optional[float]:
        return self._data.get('sensitive_question')

    # values are set into the `_data` dict
    @sensitive_question.setter
    def sensitive_question(self, value: bool) -> None:
        self._data['sensitive_question'] = value

    # in a real example there would be 20+ properties that map to questions

    def __init__(self, **kwargs):
        # Sqlalchemy does not use the __init__ method so we are free to set object defaults here
        self._data = {'_salt': salt_shaker()}
        for key in kwargs:
            setattr(self, key, kwargs[key])

    @property
    def _salt(self) -> str:
        return self._data['_salt']


def salt_shaker():
    return ''.join([random.choice('hldjs..' for i in range(50)])

The Problem

After the SensitiveQuestionnaire object is initialized none of the changes are persisted in the database.

# GIVEN a questionnaire 
questionnaire = model.SensitiveQuestionnaire(user_id=1)
db.session.add()
db.session.commit()

# WHEN updating the questionnaire and saving it to the database
questionnaire.sensitive_question= True
db.session.commit()

# THEN we get the questionnaire from the database
db_questionnaire = model.SensitiveQuestionnaire.query\
                   .filter(model.SensitiveQuestionnaire.user_id == 1).first()

# THEN the sensitive_question value is persisted
assert db_questionnaire.sensitive_question is True

Value from the db_questionnaire.sensitive_question is None when it should be True.

Daniel Butler
  • 3,239
  • 2
  • 24
  • 37

1 Answers1

3

After spending the better part of the day to figure this out, the cause of the issue is how Sqlalchemy knows when there is a change. The short version is sqlalchemy uses python's __setitem__ to hook in sqlalchemy's change() method letting it know there was a change. More info can be found in sqlalchemy's docs.

The answer is to wrap the StringEncryptedType in a MultableDict Type

Mutation Tracking

Provide support for tracking of in-place changes to scalar values, which are propagated into ORM change events on owning parent objects. From SqlAlchemy's docs: https://docs.sqlalchemy.org/en/13/orm/extensions/mutable.html

Solution

Condensed version... wrapping the StringEncryptedType in a MutableDict

_data = data: dict = sa.Column(
        MutableDict.as_mutable(StringEncryptedType(JSONType, 'secret', AesEngine, 'pkcs5')),
        nullable=False, default=lambda: {'_salt': salt_shaker()})

Full version from the question above

import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.mutable import MutableDict
from sqlalchemy_utils.types import JSONType
from sqlalchemy_utils.types.encrypted.encrypted_type import StringEncryptedType, AesEngine


Base = declarative_base()

class SensitiveQuestionnaire(Base):
    user_id: int = sa.Column(sa.Integer, primary_key=True, autoincrement=True)

    # The MutableDict.as_mutable below is what changed!
    _data = data: dict = sa.Column(
        MutableDict.as_mutable(StringEncryptedType(JSONType, 'secret', AesEngine, 'pkcs5')),
        nullable=False, default=lambda: {'_salt': salt_shaker()})

    @property
    def sensitive_question(self) -> Optional[float]:
        return self._data.get('sensitive_question')

    # values are set into the `_data` dict
    @sensitive_question.setter
    def sensitive_question(self, value: bool) -> None:
        self._data['sensitive_question'] = value

    # in a real example there would be 20+ properties that map to questions

    def __init__(self, **kwargs):
        self._data = {'_salt': salt_shaker()}
        for key in kwargs:
            setattr(self, key, kwargs[key])

    @property
    def _salt(self) -> str:
        return self._data['_salt']


def salt_shaker():
    return ''.join([random.choice('hldjs..' for i in range(50)])

Daniel Butler
  • 3,239
  • 2
  • 24
  • 37