Backgound: we are using Cassandra to store some time series data and we are using prepared statements to access data.
We are partitioning data in tables by:
- time period (like one week or one month) and
- retention policy (like 1 year, 5 or 10 years)
Having different tables we need to prepare (only upon usage) a different statement for every combination of query, time period and retention policy, so we will have an explosion in number of prepared statements. Some math:
timePeriods = 12..52 * yearsOfData
maxNumOfPrepStatements = timePeriods * policies * numOfQueries
ourCase => (20 * 10 y) * 10 p * 10 q = 20.000 prep statements
On client side I can keep in cache only the most used PS, but I could not find a way to remove the unused ones from the server, so I am worried that having about 20.000 prepared statements could be a big cost for every node.
Problem: will this number of PS cause any problem on the server?
This breaks into smaller questions:
- How much will be the server side cost of those prepared statements?
- Will the server keep all the PS or will it remove the less used ones?
- Is there a better solution than restarting Cassandra nodes to clean the PS cache?
- using the Java client, will closing the Session / Cluster object alleviate this (server side)?