I'm trying Cassandra to replace mysql at a large dataset I have (2.5Tb/5 billion rows) that I can't scale more in a single server.
I insert/update a few million rows every hour. Currently, I'm inserting and querying one by one in cassandra because I don't know which partition has the data, and grouping them seem to be slower. But one by one, I can't match the speed of a single mysql server even with 3 cassandra nodes.
In mysql, I can batch because I know it stores all in the same server. Is it possible, using the value of the primary key, to determine the partition on client side, so I can group the queries more effectively with BATCH or SELECT..IN?
I mean, given a group of PKs like 1, 2, 3, 4, 5, 6 ... and N servers, i'd like to know that say, rows 1 3, 5 are in the same partition, so I can group then in my queries. Is this possible with cassandra?