I'm new to ElasticSearch and am trying to figure out what is the most optimal way to index 1 Terabyte of data in Cassandra.
Two options that I understand right now are:
Move data periodically to ElasticSearch using the Cassandra-River plugin and then run index on the data.
Advantage: Search queries create no impact on Cassandra load
Disadvantage: Have to sync the data periodicallyWithout moving the data run ElasticSearch on Cassandra to index the data (not sure how will this be done).
Advantage: Data always in sync
Disadvantage: Impacts Cassandra performance ?
Any thoughts would be appreciated.