What's the best/reliable method of estimating the space required in Cassandra. My Cluster consists of 2 nodes(RHEL 6.5) on Cassandra 3.11.2. I want to estimate the average size each row in every table will take in my database so that I can plan accordingly. I know about some methods like nodetool status command, du -sh command used in the data directory, nodetool cfstats etc. However each of these are giving different values and hence I'm not sure which one should I use in my calculations.
Also I found out that apart from the actual data, various metadata is also stored by Cassandra in various system specific tables like size_estimates, sstable_activity etc. Does this metadata also keep on increasing with the data? What's the ratio of space occupied by such metadata and the space occupied by the actual data in the database? Also what particular configurations in YAML(if any) should I keep in mind which might affect the size of the data.
A similar question was asked before but I wasn't satisfied by the answer.