27

Flyway is a very convenient schema migration/evolution tool in the RDBMS world. I'm looking for something similar for ES.

Even though ES is different from RDBMS and I get that, the whole point of a tool like Flyway is basically doing the same schema changes in multiple environments such as 5 developer environments and staging/production environments. Even if I go with the aliasing approach described in a blog post, I still need to do that create-new-index-then-load-data-into-it-then-update-alias cycle in each environment. What I'm looking for is an automated way of doing that.

I can't just ask each developer to run a particular script after they pull a particular commit. Nor do I want to remember to manually run scripts like that in staging and production environments after deploying the latest codebase. Especially when the person doing a deployment is not the one who wrote migration scripts. All that feels so 20 years ago.

The problem has been solved multiple times in the RDBMS world. There are multiple mature tools out there. Flyway is just one of them and is my favorite. But I can't find anything similar for ES. I googled half the Web for it. Either my googling skills are very poor or a tool like that doesn't exist.

What am I missing? Is there a tool I can't find? Or am I completely misunderstanding something about ES and a tool like that doesn't make sense because of something I don't yet understand?

Elnur Abdurrakhimov
  • 44,533
  • 10
  • 148
  • 133
  • Possible duplicate of [Liquibase or Flyaway database migration alternative for Elasticsearch](http://stackoverflow.com/questions/23977688/liquibase-or-flyaway-database-migration-alternative-for-elasticsearch) – JiriS Apr 12 '17 at 15:32

3 Answers3

16

For create-new-index-then-load-data-into-it-then-update-alias, what we do is:

  1. We use templates for the mapping
  2. And we use curator to create/update the index/alias automatically.

Still the curator has to be run periodically, but we run it in a cron job.

J Aamish
  • 522
  • 7
  • 12
  • 1
    Thanks. Curator looks promising. The only question I have about it is if actions are idempotent. That is, will Curator only run an action once even if I keep invoking it with the same action file? – Elnur Abdurrakhimov Apr 12 '17 at 19:28
  • 2
    @ElnurAbdurrakhimov It depends on what action you are running. What we do is, we define an [action file](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/actionfile.html) in curator and then run it always with this action file. For e.g. we create alias by joining the last 12 weeks into an alias. So everytime I run curator, this week, it will always create the alias the same way. When I run it the next week, it will remove the oldest entry and add the week before to the alias. It really depends on what you are trying to achieve here. We found curator to be very flexible. – J Aamish Apr 13 '17 at 11:18
  • What I'm trying to achieve here is the behavior similar to Flyway. That is, I would keep adding actions to the action file and have each action executed once even if I execute the action file many times. But I guess that's not how Curator works. – Elnur Abdurrakhimov Apr 13 '17 at 18:36
  • Yes, curator executes the action file everytime you run it. What actions are you thinking of doing only once? Could you give some examples. – J Aamish Apr 15 '17 at 18:44
  • 1
    Basically, creating, deleting, and aliasing indexes. Also maybe migrating data from an old index to a new one. My idea was that I would just add a new action to an action file every time I want an index created or deleted. And then just execute Curator on each deployment. That's the flow similar to Flyway I'm trying to replicate; except that SQL migrations do much more actions most of which don't make much sense in Elasticsearch, I guess. – Elnur Abdurrakhimov Apr 16 '17 at 05:06
  • Yes, ES and SQL are quite different and that is probably the reason why curator does not support such an option. Creating, deleting, aliasing - even if done repeatedly would not create an issue. In fact, alias creation has to be done periodically again and again, if the alias is time based. If you really want to execute it only once during deployments, then I suppose you have to create unique action files for every deployment and replace the old one. – J Aamish Apr 16 '17 at 10:22
  • Okay. As long as actions are idempotent, that should work. I'll give it a try. Thank you very much for following up. – Elnur Abdurrakhimov Apr 16 '17 at 22:10
  • Just want to add that, the actions are idempotent, as long as the index names are unique. ie. you don't try to recreate an index with the same name that you just deleted. In our case, we use index name based on the day, so it works without issues for us. – J Aamish Apr 17 '17 at 12:37
  • 1
    Actions, are not idempotent in the sense you would get coming from Flyway. In particular you will probably receive messages such as Elasticsearch logs for more information. Exception: TransportError(400, u'index_already_exists_exception', u'index [errors-2017.06.20/5Pd8ig6_Ssa9aeQ-4fp6Vg] already exists – bearrito Jun 19 '17 at 14:30
1

In 2020, there seems to be an easier approach: The reindex API. You only need to do

POST _reindex
{
  "source": {
    "index": "my-index-000001"
  },
  "dest": {
    "index": "my-new-index-000001"
  }
}

and the data gets re-indexed.

I am new to Elasticsearch so don't hesitate to point out where I can improve :)

ch271828n
  • 15,854
  • 5
  • 53
  • 88
1

You can do this to a certain extent with the elasticsearch-evolution tool, which describes itself as a "flyway for elastic": https://github.com/senacor/elasticsearch-evolution

sigma1510
  • 1,165
  • 1
  • 11
  • 26