-1

(Single Node Cluster)I've got a table having 2 columns, one is of 'text' type and the other is a 'blob'. I'm using Datastax's C++ driver to perform read/write requests in Cassandra.

The blob is storing a C++ structure.(Size: 7 KB).

Since I was getting lesser than desirable throughput when using Cassandra alone, I tried adding Ignite on top of Cassandra, in the hope that there will be significant improvement in the performance as now the data will be read from RAM instead of hard disks.

However, it turned out that after adding Ignite, the performance dropped even more(roughly around 50%!).

Read Throughput when using only Cassandra: 21000 rows/second.
Read Throughput with Cassandra + Ignite: 9000 rows/second.

Since, I am storing a C++ structure in Cassandra's Blob, the Ignite API uses serialization/de-serialization while writing/reading the data. Is this the reason, for the drop in the performance(consider the size of the structure i.e. 7K) or is this drop not at all expected and maybe something's wrong in the configuration?

Cassandra: 3.11.2 RHEL: 6.5

Configurations for Ignite are same as given here.

I got significant improvement in Ignite+Cassandra throughput when I used serialization in raw mode. Now the throughput has increased from 9000 rows/second to 23000 rows/second. But still, it's not significantly superior to Cassandra. I'm still hopeful to find some more tweaks which will improve this further.

I've added some more details about the configurations and client code on github.

Vishal Sharma
  • 1,670
  • 20
  • 55
  • Never test the performance of distributed systems on single node cluster... Talking about caching - you can tune Cassandra to cache keys & data, so you don't need Ignite – Alex Ott Apr 30 '18 at 07:04
  • In my use case, Cassandra's row cache isn't of any use, as I'll be performing 'read+write'. Also, since I'm comparing both running on single node, why should Cassandra+Ignite perform less than standalone Cassandra? – Vishal Sharma Apr 30 '18 at 07:11
  • Well, because these applications will compete for the same resources(CPU, memory). Also, you're asking about mistake in configuration, but didn't provide any – Evgenii Zhuravlev Apr 30 '18 at 07:14
  • I understand that a lot of factors are in play here but for ignite, I've not played with the configurations at all. It's pretty much default configurations that I'm using. Also, there isn't any lack of resources in both cases(AFAIK of course), the node is having around 35 GB free RAM and has 24 cores. I'm doubtful if there's any benefit at all to be achieved by using Ignite + Cassandra because till now there's none. – Vishal Sharma Apr 30 '18 at 07:24
  • As far as I see you didn't configure memory for Ignite(https://apacheignite.readme.io/docs/memory-configuration). How much data do you have, also, how much heap you have configured for Ignite? You definitely need to run something like JFR to understand what happens on your node. – Evgenii Zhuravlev Apr 30 '18 at 08:46
  • What exactly is JFR? – Vishal Sharma Apr 30 '18 at 09:27
  • Java Flight Recorder – Evgenii Zhuravlev May 01 '18 at 06:40
  • @VishalSharma Is there any chance you can share your project on github or somewhere else, so we could help you tune it? – Dmitriy May 01 '18 at 21:27
  • Sure. I'll update soon. – Vishal Sharma May 02 '18 at 02:34
  • @Dmitriy please let me know in case you want any info regarding the info on github – Vishal Sharma May 02 '18 at 11:28
  • @VishalSharma Got it, let me check the repo and get back to you: https://github.com/vishal14101993/Cassandra-Ignite – Dmitriy May 02 '18 at 15:33

1 Answers1

0

Looks like you do one get per each key in this benchmark for Ignite and you didn't invoke loadCache before it. In this case, on each get, Ignite will go to Cassandra to get value from it and only after it will store it in the cache. So, I'd recommend invoking loadCache before benchmarking, or, at least, test gets on the same keys, to give an opportunity to Ignite to store keys in the cache. If you think you already have all the data in caches, please share code where you write data to Ignite too.

Also, you invoke "grid.GetCache" in each thread - it won't take a lot of time, but you definitely should avoid such things inside benchmark, when you already measure time.

Evgenii Zhuravlev
  • 2,987
  • 1
  • 9
  • 15
  • 1
    Ahhh... now I got it... absolutely agree. Whenever testing a cache, you need to make sure that you read the same key more than once, so the 2nd and following reads come from memory. – Dmitriy May 04 '18 at 21:22
  • I had made sure that when I perform read, all the data was already in the cache(I write the data first and then perform read and that too multiple times). I also check this during the program by cout< – Vishal Sharma May 07 '18 at 05:51
  • How do you write data to cache? What is the amount of data you store in Ignite? – Evgenii Zhuravlev May 07 '18 at 07:01
  • I did run a program to write data to the cache first. I'll see if I'll be able to upload that code as well. Although I don't see how that matters because like Dmitriy said, if I read the same key more than once, the 2nd and following reads will come from memory and like I'd said, I had run the read program multiple times. Also, like I had said above, I had actually checked if the data was lying in the memory. – Vishal Sharma May 08 '18 at 05:35