7

I am doing performance comparisons of ScyllaDB and Cassandra, specifically looking at the impact of memory. The machines I am using each have 16GB and 8 cores. Based on the docs, Cassandra will default to 4GB Xmx and use the remaining 12GB as file system cache. https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html ScyllaDB instead will use all 16GB for itself.
http://docs.scylladb.com/faq/#scylla-is-using-all-of-my-memory-why-is-that-what-if-the-server-runs-out-of-memory

What I'm wondering is if this is a fair comparison setup (4GB Xmx for Cassandra vs 16GB for Scylla)? I realize this is what each recommend, but would a more fair test be 8GB Xmx for Cassandra and --memory 8G for ScyllaDB? My workload is mostly write intensive and I don't expect file system caching to always be able to help Cassandra. It's odd to me that ScyllaDB does not expect almost any file system caching compared to Cassandra's huge reliance on it.

trincot
  • 317,000
  • 35
  • 244
  • 286
  • Use at least 8gb for Cassandra or it wont work well at all. I would never recommend going lower. You will likely see better overall with 12 or 14gb long term than hoping for good use of system cache. 4gb will work horribly though. document you linked recommended `max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB)` which is pretty good (min 8gb) – Chris Lohfink Oct 30 '17 at 20:45
  • Thanks. From my testing it makes a big difference for Cassandra moving from 4GB Xmx to 8GB Xmx (also moved Xmn from 1GB to 2GB). SSTable flushes are ~33MB instead of ~19MB, which reduces compactions and CPU. Probably better to increase system RAM as well, but from what I've seen I tend to agree that the better use of RAM on a 16GB machine is more in the heap, not system cache. One clarification though, for a 16GB machine, doesn't max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB) = max ( 1GB, 4GB ) = 4GB, not 8GB. That's what it sets to if I don't override the default settings. – Riley Zimmerman Oct 31 '17 at 13:08

2 Answers2

16

Cassandra will always use all of the system memory; the heap size (-Xmx) setting just determines how much is used by the heap and how much by other memory consumers (off-heap structures and the page cache). So if you limit Scylla's memory usage, it will be at a disadvantage compared to Cassandra.

Avi Kivity
  • 1,362
  • 9
  • 17
11

Scylla will use ~1/2 of the memory for MemTable, and the other half for Key/Partition caching. If your workload is mostly write, more memory will have less of effect on performance, and should be bounded by either I/O or CPU.

I would recommend reading: http://www.scylladb.com/2017/10/05/io-access-methods-scylla/

To understand the way Scylla is writing information. And http://www.scylladb.com/2016/12/15/sswc-part1/ To understand the way Scylla is balancing I/O workloads

gutkinde
  • 405
  • 2
  • 5
  • 1
    Thanks! Both links are very helpful. I did not realize it was not using the system file cache on purpose. "In ScyllaDB, we bypass all the Operating System buffering system (known as the page cache) and write directly to the disk using Direct I/O, so we are in the perfect position to know exactly at which rate each buffer is being flushed to the actual storage." – Riley Zimmerman Oct 31 '17 at 13:23