4

How do I migrate a codebase from ndb to SQLalchemy?

Tempted to write a parser/emitter, taking in the 40+ ndb.Model of my codebase and generate sqlalchemy.schema.Table or Base inheriting classes. But that doesn't:

  1. solve the data migration problem;
  2. doesn't enable a middle-road, where I can access data before migrating;
  3. requires manual work in migrating all the query syntax over

Happy to do the work, but asking here in case there's a better way (e.g., I saw this 10-year old question Can SQLAlchemy be used with Google Cloud SQL? ; or maybe some way of dumping the schema directly from ndb to SQL then from SQL directly to SQLalchemy).

Samuel Marks
  • 1,611
  • 1
  • 20
  • 25
  • I worked on such a project some years ago. I think the most promising approach to changing the code was AST-rewriting (we looked at Red Baron at the time, libcst would be better if it's Python3). In the end we went with a manual approach though, as move from No-SQL to relational required a lot of changes to the data model. Migration-wise, I did look at downloading the datastore as protocol buffers and transforming and loading into MySQL, but eventually ran scripts via the remote shell (not fast, but our data was relatively small). – snakecharmerb Sep 17 '22 at 07:26

1 Answers1

1

To migrate you need to write a migration script that will copy data from the old datastore to the new one. You can use the ndb library to read from the old datastore and the sqlalchemy library to write to the new one. You can't do it automatically. You can't do it partially. You can't do it in a way that will allow you to access data before migrating. You can't do it without manual work. You can't do it without writing a migration script.

Superficial example of a migration script:

from google.cloud import ndb
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

# Connect to the old datastore
ndb_client = ndb.Client()
ndb_context = ndb_client.context()
ndb_context.set_cache_policy(False)

# Connect to the new datastore
engine = create_engine('postgresql://user:password@localhost:5432/mydatabase')
Session = sessionmaker(bind=engine)
session = Session()

# Read all the entities from the old datastore
query = MyEntity.query()
for entity in query.fetch():
    # Convert the entity to the new format
    new_entity = MyEntity(
        id=entity.key.id(),
        name=entity.name,
        created_at=entity.created_at,
        updated_at=entity.updated_at,
    )
    # Write the new entity to the new datastore
    session.add(new_entity)

# Commit the changes
session.commit()

# Rollback the changes
session.rollback()
Tibic4
  • 3,709
  • 1
  • 13
  • Thanks, looking at your answer I can infer a way to use take your pattern and use the [`ast`](https://docs.python.org/3/library/ast.html) module to parse `ndb` `class`es then code-generate SQLalchemy `class`es for each [`ndb.Model`](https://cloud.google.com/appengine/docs/legacy/standard/python/ndb/modelclass). – Samuel Marks Sep 27 '22 at 01:11