This one is a bit longer so I can't put it into comment ... sorry.
I know that this sounds a bit strange, especially for a later stage of
your project, but the thing is with the indexes the situation won't get
any better over time. I would strongly recommend to start making your own
tables instead of just putting index on following stuff. Depending on
how often the data is accessed you can use "inverted indexes".
CREATE INDEX links_by_author_url_idx ON keyspace.links_by_author (url);
CREATE INDEX docs_url_idx ON keyspace.docs (url);
CREATE INDEX om_master_object_id_idx ON keyspace.om (master_object_id);
CREATE INDEX actions_pday_idx ON keyspace.actions (pday);
CREATE INDEX authors_yauid_idx ON keyspace.authors (yauid);
CREATE INDEX authors_login_lr_idx ON keyspace.authors (login_lr);
CREATE INDEX authors_login_idx ON keyspace.authors (login);
CREATE INDEX authors_email_idx ON keyspace.authors (email);
CREATE INDEX authors_name_idx ON keyspace.authors (name);
Basically every index that you have here enables you to "search" over base
entities to find them by some condition. Most of the conditions are
actually pretty narrow which is a good news. But the thing is the indexes
will become massive (already did), especially on docs and authors. But I guess
doc's is more problematic.
You should consider making separate tables for this. Every index that
you create will be there on every node in the cluster and in the end
you will hold far more data than you really need too because under
the hood data is multiplied per node. When you add replication factor to this
system is using a lot of space without you even being aware.
The problem with joining nodes is that when they receive new data all
the data in the cluster needs to be rebuilt ... for every single node
in the cluster and this is costing you a lot of time. So basically you loose
all the benefits of "easy node joining" that cassandra has.
Now you might think that space will become problem when you write the data
into your new schema that is denormalized ....
If space is the problem you can use a technique called inverted indexes
where you just put the id of the information into the search table and
then you make second load in the main table. I used this on some project
where space was the issue but since you have all the main stuff indexed
space will probably not be a problem because you are already using a lot
more than you think. (my bet would be that you will also probably save significantly on space)
Anyway all the indexes should become tables ... if consistency is problem,
use batches (don't use materialized views yet because you might loose data).
My honest tip is that you stay away from indexes. I know
it's hell to refactor this plus it's hard to get the time to refactor :( But
I think it should be manageable.