7

I see a lot of statements like: "Cassandra very fast on writes", "Cassandra has reads really slower than writes, but much faster than Mysql"

On my windows7 system: I installed Mysql of default configuration. I installed PHP5 of default configuration. I installed Casssandra of default configuration.

Making simple write test on mysql: "INSERT INTO wp_test (id,title) VALUES ('id01','test')" gives me result: 0.0002(s) For 1000 inserts: 0.1106(s)

Making simple same write test on Cassandra: $column_faily->insert('id01',array('title'=>'test')) gives me result of: 0.005(s) For 1000 inserts: 1.047(s)

For reads tests i also got that Cassandra is much slower than mysql.

So the question, does this sounds correct that i have 5ms for one write operation on Cassadra? Or something is wrong and should be at least 0.5ms.

Ivan Gusev
  • 252
  • 5
  • 10
  • 1
    http://stackoverflow.com/questions/4068946/a-bit-of-advice-on-cassandra-vs-mysql and http://stackoverflow.com/questions/2332113/switching-from-mysql-to-cassandra-pros-cons may have some useful advice. – Jeremy Harris Jan 13 '12 at 12:24
  • It wouldn't surprise me if Cassandra was happier on a Linux system. – ceejayoz Jan 13 '12 at 15:34
  • 1
    You can't test with 1 record only. Test with a million. MySQL will be slower in writing (esp. if InnoDB is used) but might be as fast or faster when it comes to reading based on primary key. – N.B. Jan 13 '12 at 15:34
  • @ceejayoz, tested on linux system, samre result – Ivan Gusev Jan 14 '12 at 05:46
  • @N.B., I tested with 1000 records, it is enough for one user's action. – Ivan Gusev Jan 14 '12 at 05:48
  • 1
    cassandra is built to scale. With such a benchmark (one client doing just a 1000 inserts), mysql will win. You'll get advantage from cassandra when your mysql outgrows 1 machine, and you have to spread the data and load over many machines. – nos Jan 15 '12 at 21:41
  • @nos, your answer is the best, thanks – Ivan Gusev Jun 23 '12 at 15:26

4 Answers4

13

When people say "Cassandra is faster than MySQL", they mean when you are dealing with terabytes of data and many simultaneous users. Cassandra (and many distributed NoSQL databases) is optimized for hundreds of simultaneous readers and writers on many nodes, as opposed to MySQL (and other relational DBs) which are optimized to be really fast on a single node, but tend to fall to pieces when you try to scale them across multiple nodes. There is a generalization of this trade-off by the way- the absolute fastest disk I/O is plain old UNIX flat files, and many latency-sensitive financial applications use them for that reason.

If you are building the next Facebook, you want something like Cassandra because a single MySQL box is never going to stand up to the punishment of thousands of simultaneous reads and writes, whereas with Cassandra you can scale out to hundreds of data nodes and handle that load easily. See scaling up vs. scaling out.

Another use case is when you need to apply a lot of batch processing power to terabytes or petabytes of data. Cassandra or HBase are great because they are integrated with MapReduce, allowing you to run your processing on the data nodes. With MySQL, you'd need to extract the data and spray it out across a grid of processing nodes, which would consume a lot of network bandwidth and entail a lot of unneeded complication.

Chris Shain
  • 50,833
  • 6
  • 93
  • 125
  • 2
    i have read a lot about benefits of Cassandra. My question was about 5ms per one write operation for single server. Does this speed sounds correct? For example, if someone says me that his mysql takes 5ms for one write operation (lets think about average server), i would response to him with "usually should be faster" – Ivan Gusev Jan 14 '12 at 09:25
  • MySQL performs well on a single node with terabytes of data/many users, not the best introduction. People say it's slow then have transactions, etc but you don't have to do all the things mysql lets you do. The difference is cassandra doesn't let you do them. Cassandra's limits does let it better scale though. One of cassandra's limitations is that it only does one operation relatively well, insert. It does alright at reading KV or a single range index. After that not so much. Cassandra over time however is adding more features for a bit more uniformity. – jgmjgm Jun 07 '19 at 16:11
  • Cassandra's model leans towards always have fast inserts, generally slow reads with the intention being people will do other things to get around read limitations. – jgmjgm Jun 07 '19 at 16:12
3

Cassandra benefits greatly from parallelisation and batching. Try doing 1 million inserts on each of 100 threads (each with their own connection & in batches of 100) and see which ones is faster.

Finally, Cassandra insert performance should be relatively stable (maintaining high throughput for a very long time). With MySQL, you will find that it tails off rather dramatically once the btrees used for the indexes grow too large memory.

tom.wilkie
  • 2,846
  • 20
  • 16
  • the question was about usual speed of 1 or 1000 inserts per one process (lets think like one user clicks button). Please correct me, if i send all these 1000 inserts in parallel then i get speed faster than 1 second? – Ivan Gusev Jan 14 '12 at 09:34
  • If you test that make sure inserts are comparable. Don't enable features on mysql that aren't on cassandra. Make sure mysql inserts unsafely or journels only. – jgmjgm Jun 07 '19 at 16:13
0

Many user space factors can impact write performance. Such as:

  • Dozens of settings in each of the database server's configuration.
  • The table structure and settings.
  • The connection settings.
  • The query settings.

Are you swallowing warnings or exceptions? The MySQL sample would on face value be expected to produce a duplicate key error. It could be failing while doing nothing at all. What Cassandra might do in the same case isn't something I'm familiar with.

My limited experience of Cassandra tell me one thing about inserts, while performance of everything else degrades as data grows, inserts appear to maintain the same speed. How fast it is compared to MySQL however isn't something I've tested.

It might not be so much that inserts are fast but rather tries to be never slow. If you want a more meaningful test you need to incorporate concurrency and more variations on scenario such as large data sets, various batch sizes, etc. More complex tests might test latency for availability of data post insert and read speed over time.

It would not surprise me if Cassandra's first port of call for inserting data is to put it on a queue or to simply append. This is configurable if you look at consistency level. MySQL similarly allows you to balance performance and reliability/availability though each will have variations on what they allow and don't allow.

Outside of that unless you get into the internals it may be hard to tell why one performs better than the other.

I did some benchmarks of a use case I had for Cassandra a while ago. For the benchmark it would insert tens of thousands of rows first. I had to make the script sleep for a few seconds because otherwise queries run after the fact would not see the data and the results would be inconsistent between implementations I was testing.

If you really want fast inserts, append to a file on ramdisk.

jgmjgm
  • 4,240
  • 1
  • 25
  • 18
0

It's likely that the maturity of the MySQL drivers, especially the improved MySQL drivers in PHP 5.3, is having some impact on the tests. It's also entirely possible that the simplicity of the data in your query is impacting the results - maybe on 100 value inserts, Cassandra becomes faster.

Try the same test from the command line and see what the timestamps are, then try with varying numbers of values. You can't do a single test and base your decision on that.

Jonathan Rich
  • 1,740
  • 10
  • 11
  • Lets think that i try to release typical situation: i have millions of users and every user wants to upload 1000 names(ex, book titles). And from my tests(same from command line) each user will wait result more than 1 sec. If i have less than millions of users then mysql gives me 0.1 sec. Does this sound correct or not? – Ivan Gusev Jan 14 '12 at 09:42
  • Should test also batch inserts. – jgmjgm Jun 07 '19 at 16:14