216

I want to get an object from the database if it already exists (based on provided parameters) or create it if it does not.

Django's get_or_create (or source) does this. Is there an equivalent shortcut in SQLAlchemy?

I'm currently writing it out explicitly like this:

def get_or_create_instrument(session, serial_number):
    instrument = session.query(Instrument).filter_by(serial_number=serial_number).first()
    if instrument:
        return instrument
    else:
        instrument = Instrument(serial_number)
        session.add(instrument)
        return instrument
Jens
  • 8,423
  • 9
  • 58
  • 78
FogleBird
  • 74,300
  • 25
  • 125
  • 131
  • 9
    For those who just want to add object if it doesn't exist yet, see `session.merge`: https://stackoverflow.com/questions/12297156/fastest-way-to-insert-object-if-it-doesnt-exist-with-sqlalchemy/12298306#12298306 – Anton Tarasenko Oct 25 '17 at 15:43

11 Answers11

148

Following the solution of @WoLpH, this is the code that worked for me (simple version):

def get_or_create(session, model, **kwargs):
    instance = session.query(model).filter_by(**kwargs).first()
    if instance:
        return instance
    else:
        instance = model(**kwargs)
        session.add(instance)
        session.commit()
        return instance

With this, I'm able to get_or_create any object of my model.

Suppose my model object is :

class Country(Base):
    __tablename__ = 'countries'
    id = Column(Integer, primary_key=True)
    name = Column(String, unique=True)

To get or create my object I write :

myCountry = get_or_create(session, Country, name=countryName)
skytreader
  • 11,467
  • 7
  • 43
  • 61
Kevin.
  • 2,820
  • 2
  • 23
  • 21
  • 4
    For those of you searching like me, this is the proper solution to create a row if it does not already exist. – Spencer Rathbun Feb 02 '12 at 19:11
  • 4
    Don't you need to add the new instance to the session? Otherwise if you issue a session.commit() in the calling code, nothing will happen as the new instance isn't added to the session. – CadentOrange May 22 '13 at 10:40
  • 1
    Thank you for this. I have found this so useful that I created a gist of it for future use. https://gist.github.com/jangeador/e7221fc3b5ebeeac9a08 – jangeador Aug 28 '14 at 23:23
  • 18
    Given that you pass the session as argument, it might be better to avoid the `commit` (or at least use only a `flush` instead). This leaves session control to the caller of this method and won't risk issuing a premature commit. Also, using `one_or_none()` instead of `first()` might be slightly safer. – exhuma Oct 30 '18 at 08:36
  • I agree with @exhuma – committing in the function could lead to unintended conflicts and/or performance issues. – Yaakov Bressler Dec 15 '20 at 15:59
143

That's basically the way to do it, there is no shortcut readily available AFAIK.

You could generalize it ofcourse:

def get_or_create(session, model, defaults=None, **kwargs):
    instance = session.query(model).filter_by(**kwargs).one_or_none()
    if instance:
        return instance, False
    else:
        params = {k: v for k, v in kwargs.items() if not isinstance(v, ClauseElement)}
        params.update(defaults or {})
        instance = model(**params)
        try:
            session.add(instance)
            session.commit()
        except Exception:  # The actual exception depends on the specific database so we catch all exceptions. This is similar to the official documentation: https://docs.sqlalchemy.org/en/latest/orm/session_transaction.html
            session.rollback()
            instance = session.query(model).filter_by(**kwargs).one()
            return instance, False
        else:
            return instance, True

2020 update (Python 3.9+ ONLY)

Here is a cleaner version with Python 3.9's the new dict union operator (|=)

def get_or_create(session, model, defaults=None, **kwargs):
    instance = session.query(model).filter_by(**kwargs).one_or_none()
    if instance:
        return instance, False
    else:
        kwargs |= defaults or {}
        instance = model(**kwargs)
        try:
            session.add(instance)
            session.commit()
        except Exception:  # The actual exception depends on the specific database so we catch all exceptions. This is similar to the official documentation: https://docs.sqlalchemy.org/en/latest/orm/session_transaction.html
            session.rollback()
            instance = session.query(model).filter_by(**kwargs).one()
            return instance, False
        else:
            return instance, True

Note:

Similar to the Django version this will catch duplicate key constraints and similar errors. If your get or create is not guaranteed to return a single result it can still result in race conditions.

To alleviate some of that issue you would need to add another one_or_none() style fetch right after the session.commit(). This still is no 100% guarantee against race conditions unless you also use a with_for_update() or serializable transaction mode.

swimmer
  • 1,971
  • 2
  • 17
  • 28
Wolph
  • 78,177
  • 11
  • 137
  • 148
  • 2
    I think that where you read "session.Query(model.filter_by(**kwargs).first()", you should read "session.Query(model.filter_by(**kwargs)).first()". – pkoch Jan 12 '11 at 15:53
  • 3
    Should there be a lock around this so that another thread doesn't create an instance before this thread has a chance to? – EoghanM May 22 '11 at 21:34
  • 2
    @EoghanM: Normally your session would be threadlocal so this won't matter. The SQLAlchemy session is not meant to be thread-safe. – Wolph May 23 '11 at 01:18
  • 6
    @WolpH it can be another process trying to create the same record simultaneously. Look at Django's implementation of get_or_create. It checks for integrity error, and relies upon proper use of unique constraints. – Ivan Virabyan May 21 '12 at 06:17
  • 1
    @IvanVirabyan: I assumed @EoghanM was talking about the session instance. In that case there should be a `try...except IntegrityError: instance = session.Query(...)` around the `session.add` block. – Wolph May 22 '12 at 09:26
  • Shouldn't there be a closing parenthesis after `Query(model`? – Elrond Feb 25 '13 at 23:42
  • @Elrond: I don't think so, the closing parenthesis is after the `filter_by(**kwargs)` part which generates the filter clause for you. – Wolph Feb 26 '13 at 09:32
  • @WoLpH: Looking at http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#adding-new-objects (scroll a bit down, search for `filter_by`) it seems it should be `session.query(model).filter_by(**kwargs).first()` (note also the lower q on query). – Elrond Feb 26 '13 at 09:48
  • @Elrond: Right you are, I could swear this code once worked. But since it was written about 3 years ago it's hard to test on that version of SQLAlchemy ;) – Wolph Feb 26 '13 at 09:53
  • shouldn't this query `.one()` not `.first()`? Django will also raise exception if multiple entries are found, so nesting in single `try except` with expecting `NoResultsFound` and then creating new model IMHO is better solution. – JackLeo Mar 27 '13 at 14:50
  • `.first()` is correct as SQLAlchemy will add a `TOP 1` clause to the query. Thus if there is more than 1 it will return the first and if there are no items it will return `None`. – CadentOrange May 22 '13 at 10:41
  • @Wolph Doesn't the `flush` or `commit` need to be called in order for `IntegrityError` to be thrown? – Kar Jan 26 '14 at 06:15
  • @Kate: it depends on the kind of error, but even within a transaction you can already have `IntegrityError`s. Also, it depends on autocommit behaviour of course. – Wolph Jan 27 '14 at 00:25
  • It is 2016 now. Is this included in sqlalchemy yet, or not? – Euphorbium May 09 '16 at 13:41
  • @Euphorbium: No. Probably because it's not a very common pattern in well designed applications. This is a nice fire-and-forget type of method but generally I would recommend explicitly creating objects instead. – Wolph May 11 '16 at 13:56
  • Love this solution. @Wolph - is there a way to do this in bulk? – ethanenglish Aug 22 '17 at 13:48
  • @ethanenglish there is no truly practical way to do it in bulk because an update executes across all selected rows. The result is that you would update all rows to the same values or you would need to execute a huge amount of update statements. If you execute it all in a single transaction it shouldn't be that slow however. And you could batch the inserts of course. – Wolph Aug 22 '17 at 19:06
  • where i need to put the code?, i get working out execution context error? – Dev 200 Apr 16 '18 at 13:56
  • @VictorA.the context error is probably because you don't have a valid sqlalchemy session, beyond that you can put the code anywhere – Wolph Apr 18 '18 at 10:35
  • What happens if something creates the object in the meantime? Can't believe the most upvoted answer(s) contain a race condition... – Karl Lorey Dec 22 '20 at 18:34
  • @lorey The question was about an SQLAlchemy equivalent of the Django get_or_create function and at the time of writing the Django version had no protection against race conditions. But I agree, this answer needs to be updated :) – Wolph Dec 22 '20 at 23:17
  • @lorey It looks like the Django version still doesn't handle race conditions unless it results in an `IntegrityError` (which would only occur if your violating a unique or other constraint). So... I'll update the answer to match but it will still be vulnerable to race conditions. – Wolph Dec 22 '20 at 23:22
  • 1
    Shouldn't Python 3.9 **params be **kwargs and Model be model? – PhoenixRebirthed Mar 01 '21 at 05:09
60

I've been playing with this problem and have ended up with a fairly robust solution:

def get_one_or_create(session,
                      model,
                      create_method='',
                      create_method_kwargs=None,
                      **kwargs):
    try:
        return session.query(model).filter_by(**kwargs).one(), False
    except NoResultFound:
        kwargs.update(create_method_kwargs or {})
        created = getattr(model, create_method, model)(**kwargs)
        try:
            session.add(created)
            session.flush()
            return created, True
        except IntegrityError:
            session.rollback()
            return session.query(model).filter_by(**kwargs).one(), False

I just wrote a fairly expansive blog post on all the details, but a few quite ideas of why I used this.

  1. It unpacks to a tuple that tells you if the object existed or not. This can often be useful in your workflow.

  2. The function gives the ability to work with @classmethod decorated creator functions (and attributes specific to them).

  3. The solution protects against Race Conditions when you have more than one process connected to the datastore.

EDIT: I've changed session.commit() to session.flush() as explained in this blog post. Note that these decisions are specific to the datastore used (Postgres in this case).

EDIT 2: I’ve updated using a {} as a default value in the function as this is typical Python gotcha. Thanks for the comment, Nigel! If your curious about this gotcha, check out this StackOverflow question and this blog post.

Espoir Murhabazi
  • 5,973
  • 5
  • 42
  • 73
erik
  • 1,073
  • 11
  • 13
  • 2
    Compared to what spencer [says](http://stackoverflow.com/questions/2546207/does-sqlalchemy-have-an-equivalent-of-djangos-get-or-create/21146492#comment11457084_6078058), this solution is the good one since it prevents Race conditions (by committing/flushing the session, beware) and mimics perfectly what Django does. – kiddouk Mar 14 '14 at 23:38
  • @kiddouk No, it does not mimic "perfectly". Django's `get_or_create` is *not* thread-safe. It's not atomic. Also, Django's `get_or_create` returns a True flag if the instance was created or a False flag otherwise. – Kar Mar 03 '15 at 08:19
  • 1
    @Kate if you look at Django's `get_or_create` it does almost the exact same thing. This solution also returns the `True/False` flag to signal if the object was created or fetched, and is also not atomic. However, thread-safety and atomic updates are a concern for the database, not for Django, Flask or SQLAlchemy, and in both this solution and Django's, are solved by transactions on the database. – erik Mar 04 '15 at 22:15
  • 1
    Suppose a non null field was provided null value for a new record, it will raise IntegrityError. The whole thing gets messed up, now we don't know what actually happened and we get another error, that no record is found. – rajat Oct 08 '15 at 19:31
  • This will work only if the "don't create if" condition can be expressed as a UNIQUE constraint, which is often the case but not always. – Olaf Klischat Oct 14 '16 at 07:05
  • The `IntegrityError` fallback query currently includes the `create_method_kwargs`. If the other client that created the new row did not use **exactly** the same parameters, this query will fail. Shouldn't it use the smaller set of kwargs used in the first query? – kevmitch Oct 20 '16 at 03:58
  • 2
    Shouldn't the `IntegrityError` case return `False` since this client did not create the object? – kevmitch Oct 20 '16 at 04:06
  • Can this support nested child model which are in a relationship as well ? https://stackoverflow.com/questions/58821758/sqlalchemy-get-or-create-behaviour-for-models-with-nested-relationships-and-uniq – Bill Goldberg Nov 13 '19 at 04:26
  • Rather than query when an IntegrityError occurs you could (recurse) or start from the top, until their is no contention. – cdosborn Jun 29 '20 at 21:20
  • how to do same but with upsert method ?@eric – lord stock Jun 27 '21 at 12:27
14

A modified version of erik's excellent answer

def get_one_or_create(session,
                      model,
                      create_method='',
                      create_method_kwargs=None,
                      **kwargs):
    try:
        return session.query(model).filter_by(**kwargs).one(), True
    except NoResultFound:
        kwargs.update(create_method_kwargs or {})
        try:
            with session.begin_nested():
                created = getattr(model, create_method, model)(**kwargs)
                session.add(created)
            return created, False
        except IntegrityError:
            return session.query(model).filter_by(**kwargs).one(), True
  • Use a nested transaction to only roll back the addition of the new item instead of rolling back everything (See this answer to use nested transactions with SQLite)
  • Move create_method. If the created object has relations and it is assigned members through those relations, it is automatically added to the session. E.g. create a book, which has user_id and user as corresponding relationship, then doing book.user=<user object> inside of create_method will add book to the session. This means that create_method must be inside with to benefit from an eventual rollback. Note that begin_nested automatically triggers a flush.

Note that if using MySQL, the transaction isolation level must be set to READ COMMITTED rather than REPEATABLE READ for this to work. Django's get_or_create (and here) uses the same stratagem, see also the Django documentation.

Adversus
  • 2,166
  • 20
  • 23
  • I like that this avoids rolling back unrelated changes, however the `IntegrityError` re-query may still fail with `NoResultFound` with the MySQL default isolation level `REPEATABLE READ` if the session had previously queried the model in the same transaction. The best solution I could come up with is to call `session.commit()` before this query, which is also not ideal since the user may not expect it. The referenced answer does not have this problem since the session.rollback() has the same effect of starting a new transaction. – kevmitch Oct 23 '16 at 22:19
  • Huh, TIL. Would putting the query in a nested transaction work? You're right that `commit` inside of this function is arguably worse than doing a `rollback`, even though for specific use cases it can be acceptable. – Adversus Nov 04 '16 at 09:00
  • Yes, putting the initial query in a nested transaction makes it at least possible for the second query to work. It will still fail if the user explicitly queried the model before in the same transaction though. I have decided that this is acceptable and the user should just be warned not to do this or otherwise catch the exception and decide whether to `commit()` themselves. If my understanding of the code is correct, this is what Django does. – kevmitch Nov 08 '16 at 01:49
  • In the django [documentation](https://docs.djangoproject.com/el/1.10/ref/models/querysets/#get-or-create) they say to use ` READ COMMITTED`, so it does not look like they try to handle this. Looking at the [source](https://github.com/django/django/blob/master/django/db/models/query.py#L491) confirms this. I'm not sure I understand your reply, you mean the user should put his/her query in a nested transaction? It's not clear to me how a `SAVEPOINT` influences reads with `REPEATABLE READ`. If no effect then the situation seems unsalvagable, if effect then the very last query could be nested? – Adversus Nov 08 '16 at 09:09
  • That's interesting about `READ COMMITED`, maybe I should rethink my decision not to touch database defaults. I have tested that restoring a `SAVEPOINT` from before a query was made makes it as if that query never happend in `REPEATABLE READ`. Therefore, I found it necessary to enclose the query in the try clause in a nested transaction so that the query in the `IntegrityError` except clause can work at all. – kevmitch Nov 08 '16 at 18:50
8

This SQLALchemy recipe does the job nice and elegant.

The first thing to do is to define a function that is given a Session to work with, and associates a dictionary with the Session() which keeps track of current unique keys.

def _unique(session, cls, hashfunc, queryfunc, constructor, arg, kw):
    cache = getattr(session, '_unique_cache', None)
    if cache is None:
        session._unique_cache = cache = {}

    key = (cls, hashfunc(*arg, **kw))
    if key in cache:
        return cache[key]
    else:
        with session.no_autoflush:
            q = session.query(cls)
            q = queryfunc(q, *arg, **kw)
            obj = q.first()
            if not obj:
                obj = constructor(*arg, **kw)
                session.add(obj)
        cache[key] = obj
        return obj

An example of utilizing this function would be in a mixin:

class UniqueMixin(object):
    @classmethod
    def unique_hash(cls, *arg, **kw):
        raise NotImplementedError()

    @classmethod
    def unique_filter(cls, query, *arg, **kw):
        raise NotImplementedError()

    @classmethod
    def as_unique(cls, session, *arg, **kw):
        return _unique(
                    session,
                    cls,
                    cls.unique_hash,
                    cls.unique_filter,
                    cls,
                    arg, kw
            )

And finally creating the unique get_or_create model:

from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

engine = create_engine('sqlite://', echo=True)

Session = sessionmaker(bind=engine)

class Widget(UniqueMixin, Base):
    __tablename__ = 'widget'

    id = Column(Integer, primary_key=True)
    name = Column(String, unique=True, nullable=False)

    @classmethod
    def unique_hash(cls, name):
        return name

    @classmethod
    def unique_filter(cls, query, name):
        return query.filter(Widget.name == name)

Base.metadata.create_all(engine)

session = Session()

w1, w2, w3 = Widget.as_unique(session, name='w1'), \
                Widget.as_unique(session, name='w2'), \
                Widget.as_unique(session, name='w3')
w1b = Widget.as_unique(session, name='w1')

assert w1 is w1b
assert w2 is not w3
assert w2 is not w1

session.commit()

The recipe goes deeper into the idea and provides different approaches but I've used this one with great success.

jhnwsk
  • 971
  • 12
  • 15
  • 3
    I like this recipe if only a single SQLAlchemy Session object can modify the database. I might be wrong, but if other sessions (SQLAlchemy or not) modify the database concurrently I don't see how this protects against objects that might have been created by other sessions while the transaction is ongoing. In those cases, I think that solutions that rely on flushing after session.add() and exception handling like https://stackoverflow.com/a/21146492/3690333 are more reliable. – TrilceAC Dec 11 '19 at 13:47
4

The closest semantically is probably:

def get_or_create(model, **kwargs):
    """SqlAlchemy implementation of Django's get_or_create.
    """
    session = Session()
    instance = session.query(model).filter_by(**kwargs).first()
    if instance:
        return instance, False
    else:
        instance = model(**kwargs)
        session.add(instance)
        session.commit()
        return instance, True

not sure how kosher it is to rely on a globally defined Session in sqlalchemy, but the Django version doesn't take a connection so...

The tuple returned contains the instance and a boolean indicating if the instance was created (i.e. it's False if we read the instance from the db).

Django's get_or_create is often used to make sure that global data is available, so I'm committing at the earliest point possible.

thebjorn
  • 26,297
  • 11
  • 96
  • 138
  • this should work as long as Session is created and tracked by `scoped_session`, which should implement thread-safe session management (did this exist in 2014?). – cowbert Aug 28 '18 at 05:56
3

I slightly simplified @Kevin. solution to avoid wrapping the whole function in an if/else statement. This way there's only one return, which I find cleaner:

def get_or_create(session, model, **kwargs):
    instance = session.query(model).filter_by(**kwargs).first()

    if not instance:
        instance = model(**kwargs)
        session.add(instance)

    return instance
jmberros
  • 51
  • 1
  • 5
3

There is a Python package that has @erik's solution as well as a version of update_or_create(). https://github.com/enricobarzetti/sqlalchemy_get_or_create

hekevintran
  • 22,822
  • 32
  • 111
  • 180
2

Depending on the isolation level you adopted, none of the above solutions would work. The best solution I have found is a RAW SQL in the following form:

INSERT INTO table(f1, f2, unique_f3) 
SELECT 'v1', 'v2', 'v3' 
WHERE NOT EXISTS (SELECT 1 FROM table WHERE f3 = 'v3')

This is transactionally safe whatever the isolation level and the degree of parallelism are.

Beware: in order to make it efficient, it would be wise to have an INDEX for the unique column.

fcracker79
  • 1,118
  • 13
  • 26
2

One problem I regularly encounter is when a field has a max length (say, STRING(40)) and you'd like to perform a get or create with a string of large length, the above solutions will fail.

Building off of the above solutions, here's my approach:

from sqlalchemy import Column, String

def get_or_create(self, add=True, flush=True, commit=False, **kwargs):
    """

    Get the an entity based on the kwargs or create an entity with those kwargs.

    Params:
        add: (default True) should the instance be added to the session?
        flush: (default True) flush the instance to the session?
        commit: (default False) commit the session?
        kwargs: key, value pairs of parameters to lookup/create.

    Ex: SocialPlatform.get_or_create(**{'name':'facebook'})
        returns --> existing record or, will create a new record

    ---------

    NOTE: I like to add this as a classmethod in the base class of my tables, so that
    all data models inherit the base class --> functionality is transmitted across
    all orm defined models.

    """


    # Truncate values if necessary
    for key, value in kwargs.items():

        # Only use strings
        if not isinstance(value, str):
            continue

        # Only use if it's a column
        my_col = getattr(self.__table__.columns, key)

        if not isinstance(my_col, Column):
            continue

        # Skip non strings again here
        if not isinstance(my_col.type, String):
            continue

        # Get the max length
        max_len = my_col.type.length

        if value and max_len and len(value) > max_len:

            # Update the value
            value = value[:max_len]
            kwargs[key] = value

    # -------------------------------------------------

    # Make the query...
    instance = session.query(self).filter_by(**kwargs).first()

    if instance:
        return instance

    else:
        # Max length isn't accounted for here.
        # The assumption is that auto-truncation will happen on the child-model
        # Or directtly in the db
        instance = self(**kwargs)

    # You'll usually want to add to the session
    if add:
        session.add(instance)

    # Navigate these with caution
    if add and commit:
        try:
            session.commit()
        except IntegrityError:
            session.rollback()

    elif add and flush:
        session.flush()


    return instance
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Yaakov Bressler
  • 9,056
  • 2
  • 45
  • 69
0

If somebody is looking for an actual version of this method:

SQLAlchemy 2.0.9:

from sqlalchemy import select
from sqlalchemy.orm.exc import NoResultFound
from sqlalchemy.orm.session import Session


def get_or_create(session: Session, model, **kwargs):
    try:
        instance = session.execute(select(model).
                                   filter_by(**kwargs)
                                   ).scalar_one()
    except NoResultFound:
        instance = model(**kwargs)
        session.add(instance)
        session.commit()
    return instance
Yurii
  • 36
  • 4