I do data research based on 200Mil records using elasticsearch. From time to time the index needs to be updated with new synonyms and stop words so records should be reindexed. Now I'm trying to find approaches to do the reindexing process as fast as possible. I got to the idea of building the elasticsearch plugin which should:
- Watch filestystem for synonym/stopwords file change
- Make diff of previous synonym/stopwords file
- Find records which could be affected because of synonym/stopwords file change
- Reindex only records found on 3
Maby you have better approach please share it.