Is there a simpler way to restore state between SqlAlchemy integration tests?

Question

Flask example applications Flasky and Flaskr create, drop, and re-seed their entire database between each test. Even if this doesn't make the test suite run slowly, I wonder if there is a way to accomplish the same thing while not being so "destructive". I'm surprised there isn't a "softer" way to roll back any changes. I've tried a few things that haven't worked.

For context, my tests call endpoints through the Flask test_client using something like self.client.post('/things'), and within the endpoints session.commit() is called.

I've tried making my own "commit" function that actually only flushes during tests, but then if I make two sequential requests like self.client.post('/things') and self.client.get('/things'), the newly created item is not present in the result set because the new request has a new request context with a new DB session (and transaction) which is not aware of changes that are merely flushed, not committed. This seems like an unavoidable problem with this approach.

I've tried using subtransactions with db.session.begin(subtransactions=True), but then I run into an even worse problem. Because I have autoflush=False, nothing actually gets committed OR flushed until the outer transaction is committed. So again, any requests that rely on data modified by earlier requests in the same test will fail. Even with autoflush=True, the earlier problem would occur for sequential requests.

I've tried nested transactions with the same result as subtransactions, and apparently they don't do what I was hoping they would do. I saw that nested transactions issue a SAVEPOINT command to the DB. I hoped that would allow commits to happen, visible to other sessions, and then be able to rollback to that save point at an arbitrary time, but that's not what they do. They're used within transactions, and have the same issues as the previous approach.

Update: Apparently there is a way of using nested transactions on a Connection rather than a Session, which might work but requires some restructuring of an application to use a Connection created by the test code. I haven't tried this yet. I'll get around to it eventually, but meanwhile I hope there's another way. Some say this approach may not work with MySQL due to a distinction between "real nested transactions" and savepoints, but the Postgres documentation also says to use SAVEPOINT rather than attempting to nest transactions. I think we can disregard this warning. I don't see any difference between these two databases anymore and if it works on one it will probably work on the other.

Another option that avoids a DB drop_all, create_all, and re-seeding with data, is to manually un-do the changes that a test introduces. But when testing an endpoint, many rows could be inserted into many tables, and reliably undoing this manually would be both exhausting and bug prone.

After trying all those things, I start to see the wisdom in dropping and creating between tests. However, is there something I've tried above that SHOULD work, but I'm simply doing something incorrectly? Or is there yet another method that someone is aware of that I haven't tried yet?

Update: Another method I just found on StackOverflow is to truncate all the tables instead of dropping and creating them. This is apparently about twice as fast, but it still seems heavy-handed and isn't as convenient as a rollback (which would not delete any sample data placed in the DB prior to the test case).

score 2 · Accepted Answer · answered Nov 20 '16 at 16:37

For unit tests I think the standard approach of regenerating the entire database is what makes the most sense, as you've seen in my examples and many others. But I agree, for large applications this can take a lot of time during your test run.

Thanks to SQLAlchemy you can get away with writing a lot of generic database code that runs on your production database, which might be MySQL, Postgres, etc. and at the same time it runs on sqlite for tests. It is not possible for every application out there to use 100% generic SQLAlchemy, since sqlite has some important differences with the others, but in many cases this works well.

So whenever possible, I set up a sqlite database for my tests. Even for large databases, using an in-memory sqlite database should be pretty fast. Another very fast alternative is to generate your tables once, make a backup of your sqlite file with all the emtpy tables, then before each test restore the file instead of doing a create_all().

I have not explored the idea of doing an initial backup of the database with empty tables and then use file based restores between tests for MySQL or Postgres, but in theory that should work as well, so I guess that is one solution you haven't mentioned in your list. You will need to stop and restart the db service in between your tests, though.

Is there a simpler way to restore state between SqlAlchemy integration tests?

1 Answers1