Creating seed data in a flask-migrate or alembic migration

Question

How can I insert some seed data in my first migration? If the migration is not the best place for this, then what is the best practice?

"""empty message

Revision ID: 384cfaaaa0be
Revises: None
Create Date: 2013-10-11 16:36:34.696069

"""

# revision identifiers, used by Alembic.
revision = '384cfaaaa0be'
down_revision = None

from alembic import op
import sqlalchemy as sa


def upgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.create_table('list_type',
    sa.Column('id', sa.Integer(), nullable=False),
    sa.Column('name', sa.String(length=80), nullable=False),
    sa.PrimaryKeyConstraint('id'),
    sa.UniqueConstraint('name')
    )
    op.create_table('job',
    sa.Column('id', sa.Integer(), nullable=False),
    sa.Column('list_type_id', sa.Integer(), nullable=False),
    sa.Column('record_count', sa.Integer(), nullable=False),
    sa.Column('status', sa.Integer(), nullable=False),
    sa.Column('sf_job_id', sa.Integer(), nullable=False),
    sa.Column('created_at', sa.DateTime(), nullable=False),
    sa.Column('compressed_csv', sa.LargeBinary(), nullable=True),
    sa.ForeignKeyConstraint(['list_type_id'], ['list_type.id'], ),
    sa.PrimaryKeyConstraint('id')
    )
    ### end Alembic commands ###

    # ==> INSERT SEED DATA HERE <==


def downgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.drop_table('job')
    op.drop_table('list_type')
    ### end Alembic commands ###

Minor update to documentation shows how the table can be created and then immediately bulk inserted from the created table: http://alembic.readthedocs.org/en/latest/ops.html#alembic.operations.Operations.create_table — iJames, Jan 03 '15 at 05:38
In terms of creating seed data, you could check out https://github.com/FactoryBoy/factory_boy and https://github.com/heavenshell/py-sqlalchemy_seed — Ben Creasy, Oct 22 '17 at 23:28

score 88 · Accepted Answer · edited Jun 23 '16 at 04:21

Alembic has, as one of its operation, bulk_insert(). The documentation gives the following example (with some fixes I've included):

from datetime import date
from sqlalchemy.sql import table, column
from sqlalchemy import String, Integer, Date
from alembic import op

# Create an ad-hoc table to use for the insert statement.
accounts_table = table('account',
    column('id', Integer),
    column('name', String),
    column('create_date', Date)
)

op.bulk_insert(accounts_table,
    [
        {'id':1, 'name':'John Smith',
                'create_date':date(2010, 10, 5)},
        {'id':2, 'name':'Ed Williams',
                'create_date':date(2007, 5, 27)},
        {'id':3, 'name':'Wendy Jones',
                'create_date':date(2008, 8, 15)},
    ]
)

Note too that the alembic has an execute() operation, which is just like the normal execute() function in SQLAlchemy: you can run any SQL you wish, as the documentation example shows:

from sqlalchemy.sql import table, column
from sqlalchemy import String
from alembic import op

account = table('account',
    column('name', String)
)
op.execute(
    account.update().\
        where(account.c.name==op.inline_literal('account 1')).\
        values({'name':op.inline_literal('account 2')})
        )

Notice that the table that is being used to create the metadata that is used in the update statement is defined directly in the schema. This might seem like it breaks DRY (isn't the table already defined in your application), but is actually quite necessary. If you were to try to use the table or model definition that is part of your application, you would break this migration when you make changes to your table/model in your application. Your migration scripts should be set in stone: a change to a future version of your models should not change migrations scripts. Using the application models will mean that the definitions will change depending on what version of the models you have checked out (most likely the latest). Therefore, you need the table definition to be self-contained in the migration script.

Another thing to talk about is whether you should put your seed data into a script that runs as its own command (such as using a Flask-Script command, as shown in the other answer). This can be used, but you should be careful about it. If the data you're loading is test data, then that's one thing. But I've understood "seed data" to mean data that is required for the application to work correctly. For example, if you need to set up records for "admin" and "user" in the "roles" table. This data SHOULD be inserted as part of the migrations. Remember that a script will only work with the latest version of your database, whereas a migration will work with the specific version that you are migrating to or from. If you wanted a script to load the roles info, you could need a script for every version of the database with a different schema for the "roles" table.

Also, by relying on a script, you would make it more difficult for you to run the script between migrations (say migration 3->4 requires that the seed data in the initial migration to be in the database). You now need to modify Alembic's default way of running to run these scripts. And that's still not ignoring the problems with the fact that these scripts would have to change over time, and who knows what version of your application you have checked out from source control.

Is there a reverse to `bulk_insert()`? I believe there isn't, which would make it harder to write the `downgrade`. Even if there was a bulk_delete, what do you do if the data was changed by the application and looks completely different than when it was inserted by `bulk_insert`? It would only be safe to downgrade if the table was added in that same migration, as in that case you have to delete the table anyway, but other cases are not easily addressed. Still, I don't feel a need to downvote your answer. — Miguel Grinberg, Oct 12 '13 at 20:15
If the `bulk_insert` is done when a table is created (which is often the case), dropping the table is enough. Otherwise, you can use `execute` to delete. This isn't a problem with using alembic in this way, this is a problem with database migrations. They aren't easy, and no tool is going to make them easy (only make them easier). Also, I removed my downvote from your answer after I added my comment. No hard feelings :) — Mark Hildreth, Oct 12 '13 at 20:24
@MarkHildreth I went with your approach since all I'm storing in the table in this migration are required constants, and this table is read-only. I agree that the ad-hoc table is very un-DRY. Thanks!!! — Mark Richman, Oct 12 '13 at 22:56
@MarkHildreth I'm trying to put this into a manage.py script, but I keep getting this cryptic one: `NameError: Can't invoke function 'create_table', as the proxy object has not yet been established for the Alembic 'Operations' class. Try placing this code inside a callable.` Do you have any idea as to what this means? — lol, Dec 16 '15 at 23:12
@lol I would recommend creating a new question on StackOverflow for this. — Mark Hildreth, Dec 17 '15 at 15:07
@lol you need to run this script via another script 'manage.py' like this: `python manage.py db upgrade` example here: http://blog.miguelgrinberg.com/post/flask-migrate-alembic-database-migration-wrapper-for-flask — Shoham, May 26 '16 at 15:40
to get an existing table object by the table name, please check this: https://stackoverflow.com/a/57609029/10058386 — Ziyad Sfaxi, Jan 08 '21 at 04:45

score 37 · Answer 2 · answered Oct 12 '13 at 18:33

37

Migrations should be limited to schema changes only, and not only that, it is important that when a migration up or down is applied that data that existed in the database from before is preserved as much as possible. Inserting seed data as part of a migration may mess up pre-existing data.

As most things with Flask, you can implement this in many ways. Adding a new command to Flask-Script is a good way to do this, in my opinion. For example:

@manager.command
def seed():
    "Add seed data to the database."
    db.session.add(...)
    db.session.commit()

So then you run:

python manager.py seed

answered Oct 12 '13 at 18:33

Miguel Grinberg

65,299
14
133
152

41

Sorry, a bit trigger happy, but I STRONGLY disagree with this: "Migrations should be limited to schema changes only". The answer is fine if you want to make the seed data a separate command. But, for example, if you wanted to install things like "roles" (admin, user, etc.), then that is perfectly fine to do into the migration. In fact, adding a command rather than putting it in the migration means you now must, as part of your install, do two steps (migration, data load) instead of one. Depending on your environment, go either way. But please don't say that migrations should be "limited". – Mark Hildreth Oct 12 '13 at 19:02
Ok Mark so how would you do it as part of the above migration? – Mark Richman Oct 12 '13 at 19:16
14

@MarkHildreth: my reasoning for keeping data separate from migrations is that schema changes have a pretty well defined history separate from the application, but data does not, since the application has access to it and can change it. I guess for a read-only table this would not be a problem, but this is something I would not recommend as a general practice. – Miguel Grinberg Oct 12 '13 at 19:59
18

@Miguel: I guess it really comes down to what data we're talking about. However, as I said in my answer, my definition of "seed data" is data required to run the application (constants, admin users, etc). Therefore, the necessary data SHOULD have a well-defined history that is in step with the schema, as I explained in my answer. – Mark Hildreth Oct 12 '13 at 20:03
Yeah agreed; I want to add a list of currencies to an application, which is the only way to populate the currencies table. Seems entirely natural to tie this in with the creation of said table. – Rob Grant Feb 06 '16 at 11:18
2

Although you got a bunch of upvotes on your answer, this advice could easily be leading a lot of people down the wrong path. Certain data is appropriately part of a migration (the kind of data Hildreth describes in his answer). Your argument ignores the existence of that kind of data. With respect, please edit your answer to make clear what kind of data you are talking about and removing your black/white assertion that seed data shouldn't be part of migrations. (and my deep thanks for your work on flask-migrate in general) – melchoir55 Mar 02 '17 at 20:41
@MarkHildreth agree. – May 23 '17 at 07:56
7

Migrations should **not** be limited to schema changes only. [Evolutionary Database Design](https://martinfowler.com/articles/evodb.html). – turdus-merula Oct 21 '17 at 16:45
@MarkHildreth, But what about in the middle of migration you wanted to add a data migration for development environment only. So adding this data migration on development and running `flask db upgrade` will also add this dummy data to production which only intended for staging.How do we skip that if we include that on schema migration? – Shift 'n Tab Nov 07 '19 at 11:16
1

@Roel I'm not sure I quite understand your question, but if I do and running a schema migrations for the data gives you a hard time, then don't make it a schema migration. Make it a data migration instead. Additionally, I'll add something I wish I knew six years ago: it doesn't really matter how you do it. In hindsight, I cringe at how emphatic I was about my disagreement with Miguel's answer (six years later, I'd like to apologize). Either way works, and most likely migrations are not what make our projects/products interesting/unique. – Mark Hildreth Nov 07 '19 at 15:07

score 13 · Answer 3 · answered Mar 02 '17 at 22:10

MarkHildreth has supplied an excellent explanation of how alembic can handle this. However, the OP was specifically about how to modify a flask-migration migration script. I'm going to post an answer to that below to save people the time of having to look into alembic at all.

Warning Miguel's answer is accurate with respect to normal database information. That is to say, one should follow his advice and absolutely not use this approach to populate a database with "normal" rows. This approach is specifically for database rows which are required for the application to function, a kind of data which I think of as "seed" data.

OP's script modified to seed data:

"""empty message

Revision ID: 384cfaaaa0be
Revises: None
Create Date: 2013-10-11 16:36:34.696069

"""

# revision identifiers, used by Alembic.
revision = '384cfaaaa0be'
down_revision = None

from alembic import op
import sqlalchemy as sa


def upgrade():
    ### commands auto generated by Alembic - please adjust! ###
    list_type_table = op.create_table('list_type',
    sa.Column('id', sa.Integer(), nullable=False),
    sa.Column('name', sa.String(length=80), nullable=False),
    sa.PrimaryKeyConstraint('id'),
    sa.UniqueConstraint('name')
    )
    op.create_table('job',
    sa.Column('id', sa.Integer(), nullable=False),
    sa.Column('list_type_id', sa.Integer(), nullable=False),
    sa.Column('record_count', sa.Integer(), nullable=False),
    sa.Column('status', sa.Integer(), nullable=False),
    sa.Column('sf_job_id', sa.Integer(), nullable=False),
    sa.Column('created_at', sa.DateTime(), nullable=False),
    sa.Column('compressed_csv', sa.LargeBinary(), nullable=True),
    sa.ForeignKeyConstraint(['list_type_id'], ['list_type.id'], ),
    sa.PrimaryKeyConstraint('id')
    )
    ### end Alembic commands ###


    op.bulk_insert(
        list_type_table,
        [
            {'name':'best list'},
            {'name': 'bester list'}
        ]
    )


def downgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.drop_table('job')
    op.drop_table('list_type')
    ### end Alembic commands ###

Context for those new to flask_migrate

Flask migrate generates migration scripts at migrations/versions. These scripts are run in order on a database in order to bring it up to the latest version. The OP includes an example of one of these auto-generated migration scripts. In order to add seed data, one must manually modify the appropriate auto-generated migration file. The code I have posted above is an example of that.

What changed?

Very little. You will note that in the new file I am storing the table returned from create_table for list_type in a variable called list_type_table. We then operate on that table using op.bulk_insert to create a few example rows.

Braden Holt · Answer 4 · 2018-06-18T06:45:47.603

You can also use Python's faker library which may be a bit quicker as you don't need to come up with any data yourself. One way of configuring it would be to put a method in a class that you wanted to generate data for as shown below.

from extensions import bcrypt, db

class User(db.Model):
    # this config is used by sqlalchemy to store model data in the database
    __tablename__ = 'users'

    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(150))
    email = db.Column(db.String(100), unique=True)
    password = db.Column(db.String(100))

    def __init__(self, name, email, password, fav_movie):
        self.name = name
        self.email = email
        self.password = password

    @classmethod
    def seed(cls, fake):
        user = User(
            name = fake.name(),
            email = fake.email(),
            password = cls.encrypt_password(fake.password()),
        )
        user.save()

    @staticmethod
    def encrypt_password(password):
        return bcrypt.generate_password_hash(password).decode('utf-8')

    def save(self):
        db.session.add(self)
        db.session.commit()

And then implement a method that calls the seed method which could look something like this:

from faker import Faker
from users.models import User

fake = Faker()
    for _ in range(100):
        User.seed(fake)

How would you use faker to do this when you have a foreign key column? — rasen58, May 31 '20 at 03:31
@rasen58 if you look at my init method it's only creating name, email and password prior to db insertion. ids are created during insertion and used by application when retrieving records from db. — Braden Holt, Jun 04 '20 at 17:10

score 3 · Answer 5 · answered Oct 13 '21 at 16:27

If you prefer to have a separate function to seed your data, you could do something like this:

from alembic import op
import sqlalchemy as sa

from models import User

def upgrade():
    op.create_table('users',
        sa.Column('id', sa.Integer(), nullable=False),
        sa.Column('name', sa.String(length=80), nullable=False),
        sa.PrimaryKeyConstraint('id'),
        sa.UniqueConstraint('name')
    )

    # data seed
    seed()


def seed():
    op.bulk_insert(User.__table__,
        [
            {'name': 'user1'},
            {'name': 'user2'},
            ...
        ]
    )

This way, you don't need to save the return of create_table into a separate variable to then pass it on to bulk_insert.

Creating seed data in a flask-migrate or alembic migration

5 Answers5

Linked