Would it be a good idea to read big data (query that returns billions
of results) using an indexer(elastic search/solr) overtop of
cassandra? Or would it be more preformant to ask cassandra directly? I
am only wondering about reading data, not about updating and deleting
Do you mean, reading the data, indexing it, then reading it again from the index?
Then definitely reading once would be better. i.e. asking Cassandra directly.
Unless, you want to use ElasticSearch linguistic capabilities. If your query doesn't account for natural language, then go with reading directly from Cassandra.
Should indexers only be used for searches that return smaller sets of
data?
Yes, search engines are optimized for this types of queries. Search engines solve 2 main issues:
1. Returning relevant results various types of filtering and natural languages capabilities. e.g. searching for "USA" and finding "United States of America"
2. Scoring the results in such a way that the most relevant (by some ranking function such as TD-IDF or BM25
When a search query executed only the id's of the document are returned and are assembled from the store part of the index, which is the most expensive search engine operation (besides optimizing perhaps :P ).
I guess in a nutshell my question is when is it better to query an
indexer over a big data database - more specifically cassandra when
the query narrows down the potential reaults? Does this mean if the
query returns a wide range of results that it aould be better to query
cassandra directly?
In a nutshell, if you can narrow the results from Cassandra in the same way as ElasitcSearch query, then you don't need ElasticSearch.