17

I have a SQLAlchemy model that represents a file and thus contains the path to an actual file. Since deletion of the database row and file should go along (so no orphaned files are left and no rows point to deleted files) I added a delete() method to my model class:

def delete(self):
    if os.path.exists(self.path):
        os.remove(self.path)
    db.session.delete(self)

This works fine but has one huge disadvantage: The file is deleted immediately before the transaction containing the database deletion is committed.

One option would be committing in the delete() method - but I don't want to do this since I might not be finished with the current transaction. So I'm looking for a way to delay the deletion of the physical file until the transaction deleting the row is actually committed.

SQLAlchemy has an after_delete event but according to the docs this is triggered when the SQL is emitted (i.e. on flush) which is too early. It also has an after_commit event but at this point everything deleted in the transaction has probably been deleted from SA.

ThiefMaster
  • 310,957
  • 84
  • 592
  • 636

4 Answers4

20

When using SQLAlchemy in a Flask app with Flask-SQLAlchemy it provides a models_committed signal which receives a list of (model, operation) tuples. Using this signal doing what I'm looking for is extremely easy:

@models_committed.connect_via(app)
def on_models_committed(sender, changes):
    for obj, change in changes:
        if change == 'delete' and hasattr(obj, '__commit_delete__'):
            obj.__commit_delete__()

With this generic function every model that needs on-delete-commit code now simply needs to have a method __commit_delete__(self) and do whatever it needs to do in that method.


It can also be done without Flask-SQLAlchemy, however, in this case it needs some more code:

  • A deletion needs to be recorded when it's performed. This is be done using the after_delete event.
  • Any recorded deletions need to be handled when a COMMIT is successful. This is done using the after_commit event.
  • In case the transaction fails or is manually rolled back the recorded changes also need to be cleared. This is done using the after_rollback() event.
Zitrax
  • 19,036
  • 20
  • 88
  • 110
ThiefMaster
  • 310,957
  • 84
  • 592
  • 636
  • 1
    This is called on *every* commit for *every* model that's in the app. Isn't it an overkill? – Devi Mar 26 '15 at 05:06
  • Why? Deletes are not the most common operation and one function call doesn't add much overhead. – ThiefMaster Mar 26 '15 at 13:46
  • 1
    But `models_committed` is not limited to `delete` only, but called for `insert`s and `update`s which are most common. From the [doc](https://pythonhosted.org/Flask-SQLAlchemy/signals.html#models_committed), "This signal is sent when changed models where committed to the database" – Devi Mar 27 '15 at 07:04
  • True, wasn't thinking about that. Anyway, still just minor overhead. – ThiefMaster Mar 27 '15 at 09:27
  • 1
    Recently flask-sqlalchemy added `SQLALCHEMY_TRACK_MODIFICATIONS` to enable/disable the signalling subsystem. It would be beautiful to be able to enable it on a per-model basis but probably it would re-trigger the performance issue when deciding if to emit the signal or not :D – Paolo Casciello Nov 22 '15 at 18:03
  • It looks like [signalling support may be deprecated now](https://flask-sqlalchemy.palletsprojects.com/en/2.x/signals/). – Maximilian Burszley Sep 02 '20 at 15:37
5

This follows along with the other event-based answers, but I thought I'd post this code, since I wrote it to solve pretty much your exact problem:

The code (below) registers a SessionExtension class that accumulates all new, changed, and deleted objects as flushes occur, then clears or evaluates the queue when the session is actually committed or rolled back. For the classes which have an external file attached, I then implemented obj.after_db_new(session), obj.after_db_update(session), and/or obj.after_db_delete(session) methods which the SessionExtension invokes as appropriate; you can then populate those methods to take care of creating / saving / deleting the external files.

Note: I'm almost positive this could be rewritten in a cleaner manner using SqlAlchemy's new event system, and it has a few other flaws, but it's in production and working, so I haven't updated it :)

import logging; log = logging.getLogger(__name__)
from sqlalchemy.orm.session import SessionExtension

class TrackerExtension(SessionExtension):

    def __init__(self):
        self.new = set()
        self.deleted = set()
        self.dirty = set()

    def after_flush(self, session, flush_context):
        # NOTE: requires >= SA 0.5
        self.new.update(obj for obj in session.new 
                        if hasattr(obj, "after_db_new"))
        self.deleted.update(obj for obj in session.deleted 
                            if hasattr(obj, "after_db_delete"))
        self.dirty.update(obj for obj in session.dirty 
                          if hasattr(obj, "after_db_update"))

    def after_commit(self, session):
        # NOTE: this is rather hackneyed, in that it hides errors until
        #       the end, just so it can commit as many objects as possible.
        # FIXME: could integrate this w/ twophase to make everything safer in case the methods fail.
        log.debug("after commit: new=%r deleted=%r dirty=%r", 
                  self.new, self.deleted, self.dirty)
        ecount = 0

        if self.new:
            for obj in self.new:
                try:
                    obj.after_db_new(session)
                except:
                    ecount += 1
                    log.critical("error occurred in after_db_new: obj=%r", 
                                 obj, exc_info=True)
            self.new.clear()

        if self.deleted:
            for obj in self.deleted:
                try:
                    obj.after_db_delete(session)
                except:
                    ecount += 1
                    log.critical("error occurred in after_db_delete: obj=%r", 
                                 obj, exc_info=True)
            self.deleted.clear()

        if self.dirty:
            for obj in self.dirty:
                try:
                    obj.after_db_update(session)
                except:
                    ecount += 1
                    log.critical("error occurred in after_db_update: obj=%r", 
                                 obj, exc_info=True)
            self.dirty.clear()

        if ecount:
            raise RuntimeError("%r object error during after_commit() ... "
                               "see traceback for more" % ecount)

    def after_rollback(self, session):
        self.new.clear()
        self.deleted.clear()
        self.dirty.clear()

# then add "extension=TrackerExtension()" to the Session constructor 
Eli Collins
  • 8,375
  • 2
  • 34
  • 38
  • In fact `SessionExtension` is now deprecated, but as you noted this could be rewritten in the new event framework almost with the exact same code--you would just not inherit from `SessionExtension`, and instead add a method to register all your methods as event listeners on a given session or sessionfactory (or you could use the `event.listen_for` decorators, but that has the disadvantage of tying the implementation to a specific `Session` class (granted for one's one use it's probably fine). – Iguananaut Dec 04 '16 at 13:23
1

this seems to be a bit challenging, Im curious if a sql trigger AFTER DELETE might be the best route for this, granted it won't be dry and Im not sure the sql database you are using supports it, still AFAIK sqlalchemy pushes transactions to the db but it really doesn't know when they have being committed, if Im interpreting this comment correctly:

its the database server itself that maintains all "pending" data in an ongoing transaction. The changes aren't persisted permanently to disk, and revealed publically to other transactions, until the database receives a COMMIT command which is what Session.commit() sends.

taken from SQLAlchemy: What's the difference between flush() and commit()? by the creator of sqlalchemy ...

Community
  • 1
  • 1
Samy Vilar
  • 10,800
  • 2
  • 39
  • 34
  • 1
    Triggers run in the database - so they are not really a good option. Actually they are not an option at all since the postgresql user won't have access to the file that should be deleted at all. – ThiefMaster Aug 19 '12 at 10:24
  • this is why I asked what was your backend database, there are certain dbms that allow system calls, I think MSSQL is one of them, also postgres has `plperlu` http://www.postgresql.org/docs/8.0/static/plperl-trusted.html which can run perl commands within the actual sql which in turn might be able to do system calls, keep in mind this is a bit dangerous, postgres recommends `plperl` which is far more restricted but being that you want to delete files then `plperlu` might one possible route ... – Samy Vilar Aug 20 '12 at 03:21
1

If your SQLAlchemy backend supports it, enable two-phase commit. You will need to use (or write) a transaction model for the filesystem that:

  • checks permissions, etc. to ensure that the file exists and can be deleted during the first commit phase
  • actually deletes the file during the second commit phase.

That's probably as good as it's going to get. Unix filesystems, as far as I know, do not natively support XA or other two-phase transactional systems, so you will have to live with the small exposure from having a second-phase filesystem delete fail unexpectedly.

wberry
  • 18,519
  • 8
  • 53
  • 85