2

Let's say I have an system running on Kubernetes using rolling updates or AWS ECS using blue/green deployments or any other solution which offers zero-downtime deployments. The key point here is that new and existing version can coexist and they use the same relational database.

We have a table in the database called users with a field username which we want to split in the new version into two separate fields: firstname and lastname. To do that, we change the service code accordingly and do migration of existing records. This scenario also caters for potential rollbacks. This scenario should also cater for potential rollbacks.

During the deployment we first run the migration then run the rolling update. As far as I understand, this can lead to situation, where the migration is done, but the service with the previous versions is still consuming traffic. That's why we do the migration in a way, that it supports working both with the current version and the new version. We add new fields, but not removing the old one yet. username field will be removed in a later release.

After deployment is finished, old version gets disabled and only new version is running, saving firstname and lastname separatelly. But here we end up in a situation, where if during the transition time a record was stored by the previous version, it will be stored in username field. These records are not subject of the db migraiton, because they were created after in was launched.

How to overcome that situation? Should the db migration (only the data migration part) be launched after the deployment or maybe there is a gap in the process described?

znaczek5
  • 71
  • 1
  • 1
  • 5

1 Answers1

1

This is actually a good question and one that needs to be architected into the manner in which you do the upgrade.

Most commonly this is worked around by locking out clients during the upgrade. One way is like this:

  1. Change the password of the DB user that the clients use, or use the maintenance mode if the client supports it (e.g. if it's a website they may have a mode that displays "This site is under temporary maintenance please try again later). Other sites just allow the site to go to a 5xx error during the upgrade.
  2. Perform the upgrade. Since the clients can no longer access it, you won't have any chance of a legit update
  3. Wait for the rollout to complete
  4. Test the new version (use the new credentials here)
  5. Change the password back to the original password
  6. Check you can still authenticate using the original credentials
  7. Remove the maintenance mode if enabled
  8. Check the clients are able to connect and monitor for any errors
Blender Fox
  • 4,442
  • 2
  • 17
  • 30
  • As I understand, that approach is not a zero-downtime deployment, because practically, the system get's disable for users during the transition period. In that scenario I would consider whether rolling update or blue/green deployment still makes sense. – znaczek5 May 23 '23 at 08:32