28

Suppose my db model contains an object User:

Base = declarative_base() 

class User(Base):                                                               
    __tablename__ = 'users'                                                     

    id = Column(String(32), primary_key=True, default=...) 
    name = Column(Unicode(100))                                             

and my database contains a users table with n rows. At some point I decide to split the name into firstname and lastname, and during alembic upgrade head I would like my data to be migrated as well.

The auto-generated Alembic migration is as follows:

def upgrade():
    op.add_column('users', sa.Column('lastname', sa.Unicode(length=50), nullable=True))
    op.add_column('users', sa.Column('firstname', sa.Unicode(length=50), nullable=True))

    # Assuming that the two new columns have been committed and exist at
    # this point, I would like to iterate over all rows of the name column,
    # split the string, write it into the new firstname and lastname rows,
    # and once that has completed, continue to delete the name column.

    op.drop_column('users', 'name')                                             

def downgrade():
    op.add_column('users', sa.Column('name', sa.Unicode(length=100), nullable=True))

    # Do the reverse of the above.

    op.drop_column('users', 'firstname')                                        
    op.drop_column('users', 'lastname')

There seem to be multiple and more or less hacky solutions to this problem. This one and this one both propose to use execute() and bulk_insert() to execute raw SQL statements during a migration. This (incomplete) solution imports the current db model but that approach is fragile when that model changes.

How do I migrate and modify the existing content of column data during an Alembic migration? What is the recommended way, and where is it documented?

Community
  • 1
  • 1
Jens
  • 8,423
  • 9
  • 58
  • 78

2 Answers2

38

The proposed solution in norbertpy’s answer sounds good at first, but I think it has one fundamental flaw: it would introduce multiple transactions—in between the steps, the DB would be in a funky, inconsistent state. It also seems odd to me (see my comment) that a tool would migrate a DB’s schema without the DB’s data; the two are too closely tied together to separate them.

After some poking around and several conversations (see code snippets in this Gist) I’ve decided for the following solution:

def upgrade():

    # Schema migration: add all the new columns.
    op.add_column('users', sa.Column('lastname', sa.Unicode(length=50), nullable=True))
    op.add_column('users', sa.Column('firstname', sa.Unicode(length=50), nullable=True))

    # Data migration: takes a few steps...
    # Declare ORM table views. Note that the view contains old and new columns!        
    t_users = sa.Table(
        'users',
        sa.MetaData(),
        sa.Column('id', sa.String(32)),
        sa.Column('name', sa.Unicode(length=100)), # Old column.
        sa.Column('lastname', sa.Unicode(length=50)), # Two new columns.
        sa.Column('firstname', sa.Unicode(length=50)),
        )
    # Use Alchemy's connection and transaction to noodle over the data.
    connection = op.get_bind()
    # Select all existing names that need migrating.
    results = connection.execute(sa.select([
        t_users.c.id,
        t_users.c.name,
        ])).fetchall()
    # Iterate over all selected data tuples.
    for id_, name in results:
        # Split the existing name into first and last.
        firstname, lastname = name.rsplit(' ', 1)
        # Update the new columns.
        connection.execute(t_users.update().where(t_users.c.id == id_).values(
            lastname=lastname,
            firstname=firstname,
            ))

    # Schema migration: drop the old column.
    op.drop_column('users', 'name')                                             

Two comments about this solution:

  1. As noted in the referenced Gist, newer versions of Alembic have a slightly different notation.
  2. Depending on the DB driver, the code may behave differently. Apparently, MySQL does not handle the above code as a single transaction (see “Statements That Cause an Implicit Commit”). So you have to check with your DB implementation.

The downgrade() function can be implemented similarly.

Addendum. See the Conditional Migration Elements section in the Alembic Cookbook for examples of pairing schema migration with data migration.

Jens
  • 8,423
  • 9
  • 58
  • 78
  • This is a great answer - but I had to make one slight change to your code. Columns had to be specified via `t_users.c.id` or `t_users.c.name` in both `connect.execute` calls. Can you confirm this, and potentially edit your answer (or explain what's going on, if you happen to know)? – daveruinseverything May 20 '18 at 03:46
  • 1
    @daveruinseverything, confirmed and fixed the example code; thank you! – Jens May 20 '18 at 06:54
  • 1
    Unfortunately this answer is actually the one that sounds good at first but has a fundamental flaw. At least in the context of an always-up web application this is not a great way to do it. The reason is that a data migration can be extremely expensive on a big table, so you often want to avoid doing it in the "hot path" of deploying your app. If you already have some load on your DB, this can increase it and really slow down your app so it's nice to be able to run it in the background, perhaps throttle the write speed. – danny Feb 07 '19 at 00:23
  • 1
    ^ To add on since I went over the limit - the last reason @norbertby's answer is a better way to do it is that you can stop before you've done anything irreversible. This is a very simple example but in general your data migration may work fine on test data, but then in production due to a wider variety of data it breaks something. If you leave the data in both "name" and in first & last name columns, then you can roll back your app if anything goes wrong. If you've irreversibly migrated it you're in a much worse state. – danny Feb 07 '19 at 00:28
  • @danny, which is why I make a backup of the db _before_ a migration, and why I test a migration on that backup before trying it cold-turkey on a live db. – Jens Feb 07 '19 at 00:46
  • @Jens that’s cute, and I’m sure it works fine for a little side project. Super unrealistic for so many reasons though, and there are a bunch of race conditions you’re not considering. – danny Feb 08 '19 at 02:25
6

alembic is a schema migration tool and not a data migration. Though it can be used that way too. That's why you won't find a lot of docs on that. That said, I would have created three separate revisions:

  1. add firstname and lastname without removing the name
  2. read all your users just as you would in your app and split their name and then update first and last. e.g.

    for user in session.query(User).all():
        user.firstname, user.lastname = user.name.split(' ')
    session.commit()
    
  3. remove name

Sam R.
  • 16,027
  • 12
  • 69
  • 122
  • 4
    I’d argue that one can’t separate schema migration from data migration. Both go together, and one can’t migrate a db’s schema _without_ migrating along its data. So what use would Alembic be if it’s designed to do one and barely does the other? – Jens May 11 '17 at 10:48
  • 1
    Wouldn't recommend this approach. If the schema of User changes then your migration will begin to fail. – crawfobw Sep 05 '21 at 23:37
  • it doesn't work, I get the error : `Working outside of request context.` – Mouad May 06 '22 at 13:20