0

my user is interacting with a gtk2 GUI (Presenter/View/Model), and I want to let them import several records in the database, keep the interface responsive, let the user cancel the operation at any time, and when the longish operation is complete, let the user decide whether or not to commit the whole import.

I'm doing this by running the import action in a separate thread, the GUI stays responsive, in the separate thread I create database objects, add them to the session (owned by the presenter), the importing thread adds log records to a gtk.ListStore, linked to a gtk.TreeView, and this is fine as far as the GUI is concerned. When the thread finalises, the 'ok' button is enabled.

Since this is part of a fairly large program I've adopted, there's settings I can't change without reviewing the whole lot. One of them is autoflush, which is disabled in all sessions.

now I have this smallish example with a surprising behaviour:

from sqlalchemy import Column, Unicode, Integer

from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()

class Location(Base):
    __tablename__ = 'location'
    code = Column(Unicode(64), index=True, primary_key=True)
    def __init__(self, code=None):
        self.code = code
    def __repr__(self):
        return self.code

from sqlalchemy import create_engine
engine = create_engine('sqlite:///joindemo.db')

Base.metadata.create_all(engine)

from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine, autoflush=False)
session = Session()

session.query(Location).delete()
session.commit()
us = session.query(Location).all()
print us

def run_me_in_other_thread_1():
    session.autoflush = True
    u1 = Location(code=u'mario')
    u2 = Location(code=u'silla')
    session.add_all([u1, u2])
    session.autoflush = False

def run_me_in_other_thread_2():
    u1 = Location(code=u'mario')
    u2 = Location(code=u'silla')
    session.add_all([u1, u2])
    session.flush()

import threading
t = threading.Thread(target=run_me_in_other_thread_1)
t.start()
t.join()

session.commit()
us = session.query(Location).all()
print us

If you use as thread target run_me_in_other_thread_1 (temporarily enable autoflush in the session), all goes as I also expected:

[]
[mario, silla]

(Unfortunately, enabling autoflush, even just for one import, is something I cannot do within ghini/bauble and I don't have the energies to look into that. Needless to say, if you want to have a look, fork the project and remove that nasty autoflush=False, and manage to solve those 10 consequent erroring tests, be my guest! I'll be glad to add new contributors to the authors list.)

If you leave autoflush disabled and do a session.flush before returning from the target function (see run_me_in_other_thread_2), the flush will cause a ProgrammingError:

ProgrammingError: (sqlite3.ProgrammingError) SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140403696158464 and this is thread id 140403623274240

why this difference?

mariotomo
  • 9,438
  • 8
  • 47
  • 66
  • 1
    [This Q/A](https://stackoverflow.com/questions/34009296/using-sqlalchemy-session-from-flask-raises-sqlite-objects-created-in-a-thread-c) answers the root cause of your problems in this case. In short, sharing a `Session` between 2 threads is a recipe for disaster. A `Session` [is not thread-safe](http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#is-the-session-thread-safe). A `Session` is also lazy and requests a connection only when needed. The 1st function emits no queries, the 2nd does due to the manual flush, so the 1st thread creates no SQLite resources, 2nd does. – Ilja Everilä Jan 15 '18 at 18:38
  • And in short, autoflush and flush aren't different. The other just happens automatically in certain situations, namely right before queries. Staging new objects in the `Session` doesn't emit an INSERT right there and then, but when you perform other queries and have autoflush enabled they are sent to the DB. I think [this](https://stackoverflow.com/questions/4201455/sqlalchemy-whats-the-difference-between-flush-and-commit) is a good read for you. – Ilja Everilä Jan 15 '18 at 18:44
  • "sessions not thread safe", I didn't mention I was aware I was following a recipe for disaster. I don't understand how the manual flush emits queries while the autoflush does not. But how do I let things happen in the background while the user looks at the growing log, and let the user decide while looking at the log "no, I don't need this" and abort the actions (happening in the other thread). Or also just decide, at the end, "fine, let's commit". – mariotomo Jan 15 '18 at 20:46
  • A manual flush emits DML, because calling [`session.flush()`](http://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.session.Session.flush) means "flush right here, right now, any pending changes held in the session to the DB". Autoflush operates transparently and flushes pending changes from the session *before a query is sent to the DB*, so that the session's and DB's state is in sync. `run_me_in_other_thread_1` does not perform any queries, and so there's no need to [flush](http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#flushing). – Ilja Everilä Jan 15 '18 at 21:35
  • I don't follow you: `session.add_all([u1, u2])`, that's two insert queries, at least from my perspective. ... ah, it's not an EXPLICIT query. is that it? – mariotomo Jan 15 '18 at 22:03
  • That's not a query at all. You stage 2 new objects in the session. Read about [state management](http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html) and [adding new or existing items](http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#adding-new-or-existing-items). From the latter: "`add()` is used to place instances in the session. For *transient* (i.e. brand new) instances, this will have the effect of an INSERT taking place for those instances **upon the next flush**." – Ilja Everilä Jan 15 '18 at 22:07
  • ok, I think I get it, they're something that will become a query. and I'm not forcing it because I'm not manually flushing. So when the flush happens because of the commit, it happens in the same thread as where I created the session, and that's fine. Not safe, but fine in this case. But the objects were created and added to the session in the other thread... (?) – mariotomo Jan 15 '18 at 22:33
  • @mariotomo please read the SQLAlchemy ORM tutorial or some basic thing about this. They're no queries. They're objects that are persisted in the database. – Antti Haapala -- Слава Україні Jan 16 '18 at 06:05
  • I think we have a terminology misunderstanding, I call an insert/delete/select/etc action on the database a query. but when I do so, you're thinking of a sqlalchemy query object. when you say "it emits DML", I read "it generates a query", and I was not aware I was causing confusion. – mariotomo Jan 16 '18 at 11:36
  • Yes, those are queries. `session.add_all([u1, u2])` is not a query. The `Session` will at some point (implicitly or explicitly) flush those pending objects to the DB, issuing some set of queries. For example if you have DB generated defaults on columns in SQLite other than an integer primary key, SQLAlchemy has to first INSERT, then SELECT based on the primary key to read the values of the generated columns. – Ilja Everilä Jan 16 '18 at 12:54
  • In relation to the previous comment: http://docs.sqlalchemy.org/en/latest/core/defaults.html#triggered-columns. – Ilja Everilä Jan 16 '18 at 13:02

1 Answers1

0

I think I found a solution to "how do I let thread 1 tell thread 2 what to do with its database objects", and it has little to do with database objects.

I'm acquiring a threading.Lock in the GUI, and when the user hits either the 'ok' or 'cancel' button, I'm storing the choice in a module-global variable, and releasing the lock.

The database interaction in the other thread has its own session, and when finished, it also acquires the lock, meaning it goes waiting for the user to take a decision.

And this is basically.

Just for completeness, I'm correcting the above code.

global_ok = False
l = threading.Lock()

def run_me_in_other_thread():
    session = Session()
    u1 = Location(code=u'mario')
    u2 = Location(code=u'silla')
    session.add_all([u1, u2])
    l.acquire()
    if global_ok:
        session.commit()
    else:
        session.rollback()
    l.release()

t = threading.Thread(target=run_me_in_other_thread)

l.acquire()
t.start()

time.sleep(3)
global_ok = True  # or False.
l.release()
t.join()

us = session.query(Location).all()
print us
mariotomo
  • 9,438
  • 8
  • 47
  • 66