65

I'm programming a web application using sqlalchemy. Everything was smooth during the first phase of development when the site was not in production. I could easily change the database schema by simply deleting the old sqlite database and creating a new one from scratch.

Now the site is in production and I need to preserve the data, but I still want to keep my original development speed by easily converting the database to the new schema.

So let's say that I have model.py at revision 50 and model.py a revision 75, describing the schema of the database. Between those two schema most changes are trivial, for example a new column is declared with a default value and I just want to add this default value to old records.

Eventually a few changes may not be trivial and require some pre-computation.

How do (or would) you handle fast changing web applications with, say, one or two new version of the production code per day ?

By the way, the site is written in Pylons if this makes any difference.

ascobol
  • 7,554
  • 7
  • 49
  • 70
  • 1
    "So is it worth using migrate?" should be a separate question. You have an answer on how to migrate. Asking about the use cases for sqlalchemy-migrate and your specific use case is for more specific than this general question. – S.Lott Nov 17 '10 at 14:48
  • 1
    OK so I need to make another question about migrate in order to know which answer to accept. – ascobol Nov 17 '10 at 21:58
  • 1
    @ascobol: "another question about migrate in order to know which answer to accept". False. You have answers here. "Is tool [X] worth it?" is unrelated to "how do I migrate?". You have answers to "how?". Asking about the value of one particular tool is unrelated to "how?" – S.Lott Nov 17 '10 at 23:28
  • @ascobol: perhaps your unrelated question is "why doesn't migrate work?" Not "is migrate worth it?" This -- too -- is unrelated to "how?" – S.Lott Nov 17 '10 at 23:29
  • @S.Lott: Sorry but I disagree. I need to know if a tool is suitable before validating an answer. If the question was "how to manage a remote server over the internet ?" would you up-vote an answer like "telnet is your friend". (and I'm not saying that migrate if the telnet of sql migration tools). Cheers – ascobol Nov 18 '10 at 11:37
  • 2
    @ascobol: "I need to know if a tool is suitable". You were told that it was suitable. End of this question. Start of another question airing out specific problems you're having with that tool. Unrelated. – S.Lott Nov 18 '10 at 11:54

4 Answers4

46

Alembic is a new database migrations tool, written by the author of SQLAlchemy. I've found it much easier to use than sqlalchemy-migrate. It also works seamlessly with Flask-SQLAlchemy.

Auto generate the schema migration script from your SQLAlchemy models:

alembic revision --autogenerate -m "description of changes"

Then apply the new schema changes to your database:

alembic upgrade head

More info here: http://readthedocs.org/docs/alembic/

Alan Hamlett
  • 3,160
  • 1
  • 23
  • 23
  • I agree but it's a bit difficult to get it working with your flask app. I think that new code 'flask-alembic' is what I need to test out. Check this problem: http://stackoverflow.com/questions/14682466/relative-importing-python-module-from-a-subfolder-from-a-different-subfolder#comment20531813_14682466 – Dexter Feb 05 '13 at 16:26
  • 2
    After running `alembic init alembic` and changing my db username & password in alembic.ini, I changed my alembic/env.py to this: https://gist.github.com/alanhamlett/4721073 That was all that was needed to get alembic working with my flask app. Hope it helps. – Alan Hamlett Feb 06 '13 at 07:58
  • my problem is that I can't do line 10 in your gist code. I can't import my models or my "db" variable. It won't let me do it in env.py. – Dexter Feb 07 '13 at 17:06
  • 1
    Had a similar issue getting alembic to work. Line 8 and 9 from Alan's gist are very important and not really mentioned in the documentation. But you need to add the Project to the system path to allow alembic to access your models. – radix07 Jun 11 '15 at 15:46
16

What we do.

  1. Use "major version"."minor version" identification of your applications. Major version is the schema version number. The major number is no some random "enough new functionality" kind of thing. It's a formal declaration of compatibility with database schema.

    Release 2.3 and 2.4 both use schema version 2.

    Release 3.1 uses the version 3 schema.

  2. Make the schema version very, very visible. For SQLite, this means keep the schema version number in the database file name. For MySQL, use the database name.

  3. Write migration scripts. 2to3.py, 3to4.py. These scripts work in two phases. (1) Query the old data into the new structure creating simple CSV or JSON files. (2) Load the new structure from the simple CSV or JSON files with no further processing. These extract files -- because they're in the proper structure, are fast to load and can easily be used as unit test fixtures. Also, you never have two databases open at the same time. This makes the scripts slightly simpler. Finally, the load files can be used to move the data to another database server.

It's very, very hard to "automate" schema migration. It's easy (and common) to have database surgery so profound that an automated script can't easily map data from old schema to new schema.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • 1
    How do you define a major schema change? Is adding a table / new-model considered a major schema change even if compatibility didn't change? Do you rival firefox in version numbers? (joke) – Dexter Feb 05 '13 at 16:29
13

Use sqlalchemy-migrate.

It is designed to support an agile approach to database design, and make it easier to keep development and production databases in sync, as schema changes are required. It makes schema versioning easy.

Think of it as a version control for your database schema. You commit each schema change to it, and it will be able to go forwards/backwards on the schema versions. That way you can upgrade a client and it will know exactly which set of changes to apply on that client's database.

It does what S.Lott proposes in his answer, automatically for you. Makes a hard thing easy.

hybor
  • 308
  • 4
  • 12
nosklo
  • 217,122
  • 57
  • 293
  • 297
1

The best way to deal with your problem is to reflect your schema instead doing it the declarative way. I wrote an article about the reflective approach here: http://petrushev.wordpress.com/2010/06/16/reflective-approach-on-sqlalchemy-usage/ but there are other resources about this also. In this manner, every time you make changes to your schema, all you need to do is restart the app and the reflection will fetch the new metadata for the changes in tables. This is quite fast and sqlalchemy does it only once per process. Of course, you'll have to manage the relationships changes you make yourself.

vonPetrushev
  • 5,457
  • 6
  • 39
  • 51
  • 6
    Just to comment, this is a horrible practice in the real world. It forces me to check the database vs the code for all kinds of settings like lengths allowed on strings, etc. I don't get any autocomplete or source checks from my IDE. I really discourage people from going all reflective in any project. It's just a mess to work with. – Rick Nov 12 '10 at 16:15
  • 2
    "all kinds of settings like lengths allowed on strings" - can you elaborate on this? What kinds of settings? And, just because you don't get any autocomplete in ide doesn't mean that the approach is wrong. After all, we are dealing with python - not java/c++. – vonPetrushev Nov 12 '10 at 16:18
  • Also, i have apps 'in the real world' that work with db reflection. With complex ERs. It's not that messy for me. – vonPetrushev Nov 12 '10 at 16:21
  • 1
    If you have a field name on a model and I'm working on code and see person.name = 'really long string' and I want to check out what is the max length of name, I have to load up the db to go look because it's not going to be in my model as person = Column(Unicode(100)) – Rick Nov 12 '10 at 16:31
  • 1
    You don't have to 'load up the db' - (I don't know what exactly this means but you don't have to do it). After the reflection of the schema, all the info you need is stored in the metadata reflected with sqlalchemy. Specifically, to get the length of the column you need this: length = metadata.tables['tablename']._columns._data[u'columnname'].type.length. metadata being previously reflected of course. Note that this not call introspection to the db again - it is already stored. – vonPetrushev Nov 12 '10 at 16:51