41

Question about Cassandra

Why the hell on earth would anybody write a database ENGINE in Java ?
I can understand why you would want to have a Java interface, but the engine...

I was under the impression that there's nothing faster than C/C++, and that a database engine shouldn't be any slower than max speed, and certainly not use garbage collection...

Can anybody explain me what possible sense that makes / why Cassandra can be faster than ordinary SQL that runs on C/C++ code ?

Edit:
Sorry for the "Why the hell on earth" part, but it really didn't make any sense to me.
I neglected to consider that a database, unlike the average garden-varitety user programs, needs to be started only once and then runs for a very long time, and probably also as the only program on the server, which self-evidently makes for an important performance difference.

I was more comparing/referencing to a 'disfunctional' (to put it mildly) Java tax program I was using at the time of writing (or rather would have liked to use).

In fact, unlike using Java for tax programs, using Java for writing a dedicated server program makes perfect sense.

Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442
  • 3
    fyi: c/c++ is not the answer for everything. if you read the wiki-article, you would have seen facebook, digg, etc are using cassandra and i think when it comes to scalability java is just awesome. – user181750 Feb 26 '10 at 13:29
  • 7
    IMO there is nothing inherently subjective and argumentative in this question. The wording ("Why the hell") clearly needs improvement, but overall I think this is a valid question. – Erich Kitzmueller Feb 26 '10 at 13:39
  • You may ask all this one: http://java-source.net/open-source/database-engines – PeterMmm Feb 26 '10 at 13:52
  • 2
    Hadoop is written in Java. Amazon's dynamo backend is written in Java. – matt b Feb 26 '10 at 13:58
  • Of course it's subjective and argumentative, there is no correct answer to this "question" therefore every answer posted will be based in opinions. – matt b Feb 26 '10 at 13:59
  • 1
    @matt b it must have been someone's idea to start writing it in Java, and as such their justifications, whether rational or aesthetic, would be the correct answer to the question – Pete Kirkham Feb 26 '10 at 14:17
  • @Stephen C Yes, but that wasn't matt b's justification. That only one group of people have the knowledge to answer the question doesn't make a question argumentative. For example, C# team members on SO do give authoritative answers on questions about the design decisions in C#. – Pete Kirkham Feb 26 '10 at 22:56
  • @Pete Kirkham - I'm not attempting to justify @matt b's line of argument. But his conclusion is correct, IMO. – Stephen C Feb 27 '10 at 01:03
  • "Why the hell on earth" is not per se argumentative. Granted, most times it is, but it does not necessarely only imply an opinion, but also curiosity, too (granted, about something you think is most likely wrong). I should have chosen my words - or better my thoughts - more carefully in the first place. But I think Kico Lobo answer was very good, and has changed my opinion. It does make sense after all. I neglected to consider that a database needs to be started only once... stupid... and the buffer overflow reason is quite good, too. – Stefan Steiger Mar 04 '10 at 10:25
  • 1
    Here's a good answer to your question: http://programmers.stackexchange.com/questions/110634/why-would-it-ever-be-possible-for-java-to-be-faster-than-c/110651#110651 – Martin Dow Sep 27 '11 at 12:48
  • In almost every context, including discussions of program design decisions, prefixing any question with "why the hell on earth" **is** _per se_ argumentative. I suppose certain theological discussions might be exempt. – Justin Jul 06 '15 at 19:19
  • 4
    I read this today (26-September-2015) [New-Age C++ Boosts Open Source NoSQL Cassandra Speed 10x](https://adtmag.com/articles/2015/09/23/scylladb-cassandra.aspx). Summary: A rewrite of Cassandra, called ScyllaDB, using [Seastar](https://github.com/scylladb/seastar) -- a C++ framework for writing complex asynchronous applications with optimal performance on modern hardware, is 10X faster. – RajaRaviVarma Sep 25 '15 at 18:47
  • 2
    [ScyllaDB is faster than Cassandra (Benchmark)](http://www.scylladb.com/technology/cassandra-vs-scylla-benchmark-2/) and part of the reason as explianed in the architecture document is the way that the JVM works with the network stack. Java IS slower than C++ for this particular application. – Skrymsli Mar 28 '16 at 19:55

5 Answers5

114

What do you mean, C++? Hand coded assembly would be faster if you have a few decades to spare.

Martin
  • 37,119
  • 15
  • 73
  • 82
  • 2
    +1, I was about to write a similar comment – Erich Kitzmueller Feb 26 '10 at 13:32
  • 6
    No, he said "C/C++", which is the mythical faster than everything language, but whose value depends on unspecified execution order. – Pete Kirkham Feb 26 '10 at 14:14
  • 12
    Hahaha, that answer was to be expected. I found it funny, though. PS: C (the speed of light) is not MYTHICALLY faster than everything. If you benchmark, you see that it ACTUALLY is (ranging from 5 to up to 30 times) faster than Java, at the same investment of development time and level of competence). Besides, nowadays, larger amounts of hand coded assembly is in most cases slower than C, because the C compiler optimization is quite good, and the C stdlib is heavily optimized. It's still faster than C++, though. And you can throw away your assembly when the processor changes. Not so with C. – Stefan Steiger Mar 04 '10 at 10:34
  • 1
    I have to login to like your reply: "Hand coded assembly would be faster if you have a few decades to spare." – Truong Ha Jun 03 '14 at 01:51
  • @Quandary With JIT Java get a performance which C can't achieve. JIT is using information only available at runtime. – Jimmy T. Aug 14 '14 at 10:50
  • Human readable code? We don't need no stinkin' human readable code! – Kell Sep 02 '15 at 09:27
60

I can see a few reasons:

  • Security: it's easier to write secure software in Java than in C++ (remember the buffer overflows?)
  • Performance: it's not THAT worse. It's definitely worse at startup, but once the code is up and running, it's not a big thing. Actually, you have to remember an important point here: Java code is continually optimized by the VM, so in some circumstances, it gets faster than C++
Art Licis
  • 3,619
  • 1
  • 29
  • 49
Kico Lobo
  • 4,374
  • 4
  • 35
  • 48
  • 15
    And, with regards to performance, Java systems can easily be faster than equivalent C++ systems, not because of underlying language or JVM but just because one can spend more time on design and optimizations, rather than having to write, say, custom memory management subsystem. That is: just because C++ systems can be fast does not guarantee they are -- what matters more are developers, how good they are with the tools they use. Besides, for distributed stores, real bottlenecks are with network and I/O; along with coordination, not CPU. – StaxMan Dec 08 '10 at 01:03
  • I recently had SE Linux on my Laptop. SE Linux prevents buffer overflows from executing, reason 1 is therewith dead. BTW, don't there exist garbage collectors for C++ ? I think I read about it once - somewhere... – Stefan Steiger Jan 25 '11 at 20:46
  • 3
    SELinux doesn't prevent buffer overflows, the buffer overflow would still happen but it would be unexploitable. However the buffer overflow would still crash the program. – Jason Axelson Mar 30 '11 at 01:08
  • 5
    @Quandary: Unfortunately, buffer overflows are not dead at all. See http://en.wikipedia.org/wiki/Return-oriented_programming – Michael Borgwardt Jul 11 '11 at 16:34
  • @Jason Axelson: Right, but that's the point. It's true that it still crashes the program. But that way, a buffer overflow is merely usable for DOS attacks. – Stefan Steiger Jul 28 '11 at 07:23
  • @Michael Borgwardt: Hmmm, interesting. I have to take a look at that. But it certainly got a lot more difficult. – Stefan Steiger Jul 28 '11 at 07:26
  • How does cassandra handle the memory usage well if it is written in JAVA? It is known that JAVA has a big memory overhead for every object so if it would be written in C++ more objects could be placed in the RAM. Maybe they have some advanced techniques to ameliorate this. – David Michael Gang Feb 03 '15 at 13:55
  • SELinux do NOT prevent buffer overflows or other pointer-based errors. It prevents some, and it can contain breaches to the hacked application. The only true way to fix buffer overflows is still JIT:ing or interpreting code while adding fail-checks. Keep in mind that many C/C++ applications would stop working if they had all the protection mechanisms Java had. That's one of the reasons there are no full C/C++ compiler targeting Java Bytecode. – user1657170 Feb 16 '15 at 21:21
  • Memory allocation/deallocation are a bit slow in Java and there are an overhead associated with Objects. Some of this are just lack of optimization in the Virtual Machine and there are third party products that are faster. Some of it cannot easily be removed without losing functionality that code might expect. However there are strategies and patterns that makes all of this a non-issue. Creating objects like they are C++ structs are rookie mistakes, just be conservative with your objects and your performance will be fine. – user1657170 Feb 16 '15 at 21:27
36

Why the hell on earth would anybody write a database ENGINE in JAVA ?

Platform independance is a pretty big factor for servers, because you have a lot more hardware and OS heterogenity than with desktop PCs. Another is security. Not having to worry about buffer overflows means most of the worst kind of security holes are simply impossible.

I was under the impression that there's nothing faster than C/C++, and that a database engine shouldn't be any slower than max speed, and certainly not use garbage collection...

Your impression is incorrect. C/C++ is not necessarily faster than Java, and modern garbage collectors have a big part in that because they enable object creation to be incredibly fast.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
  • 3
    Michael - can you elaborate on your comment re object *creation* being fast because of the garbage collector ? – Brian Agnew Feb 26 '10 at 13:35
  • 5
    (note this is simplified...) Object creation (not destruction) can be more efficient with managed code as a compacting garbage collector will try arrange all free memory for the proccess in a contiguous area. When one needs to allocate a certain amount of memory we already know if we have enough, and can avoid trying to walk the memory of the process trying to find a free area big enough for what we need. The collorary to this though is that after the GC cleans up memory it needs to compact all GC survivors together in memory – saret Feb 26 '10 at 14:03
  • 3
    @Brian - with a modern GC, freed memory is compacted, making memory allocation trivially simple compared with a typical `malloc`. – Stephen C Feb 26 '10 at 14:49
14

Don't forget that Java VMs make use of a just-in-time (JIT) engine that perform on-the-fly optimisations to make Java comparable to C++ in terms of speed. Bearing in mind that Java is quite a productive language (despite its naysayers) and portable, together with the JIT optimisation capability, means that Java isn't an unreasonable choice for something like this.

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
  • 2
    Taking the context of the question out, I still wonder if performance in Java is similar, why "most" of UI based on Java are just crap, developers fault or the UI Java libs ? Disclaimer: I am a Java naysayers because most of the things I worked with written in Java had poor performance, but I am happy to learn different. – Radu Maris Nov 29 '13 at 10:11
  • Radu: UIs written in C/C++ crash all the ***ing time. Ones written in Java don't. Ever. Java UIs look great. Just use a modern LookAndFeel like FlatLAF. They have superb performance... two orders of magnitude faster (at least) than any web front end, no different than C/C++ really. Look at IntelliJ as a great example. – barneypitt Sep 24 '22 at 22:31
9

The performance penalty for modern Java runtimes is not that big and programming in Java is less error-prone than in c.

Otto Allmendinger
  • 27,448
  • 7
  • 68
  • 79
  • 9
    "Programming in Java is less error-prone than in C". That's a heck of a statement, care to back it up with some evidence? – Dominic Rodger Feb 26 '10 at 13:31
  • 48
    Come on, Dominic. Yes, we all know it's more than possible to write (mostly) error-free code in C. But you can't deny Java gives you less rope to hang yourself with. – Matthew Flaschen Feb 26 '10 at 13:34
  • 5
    +1 for Dominic. i've seen so many issues with carelessly written java code, that none of that java magic (gc, etc) can help. java apps don't leak memory like C? haha yeah, you wish! – rytis Feb 26 '10 at 13:35
  • 1
    @Matthew - sure. *I* can't think of any project *I'd* rather do in C than Java, I just object to the blanket statement. I can think of people around here who could write something in C that'd be a heck of a lot more solid than something I could write in Java. All I'm saying is that there's good C, and there's good Java - one isn't necessarily superior to the other. – Dominic Rodger Feb 26 '10 at 13:40
  • 8
    @pulegium: I'd take memory leaks over buffer overflow any time – Otto Allmendinger Feb 26 '10 at 13:41
  • there might be even people who write even better assembly, that doesn't mean you can't compare the difficulty of writing safe code between languages – Otto Allmendinger Feb 26 '10 at 14:05
  • I don't think one's superior to the other either. They each have different strengths and emphases (and of course different experts). But I do believe Java is less error-/prone/. – Matthew Flaschen Feb 26 '10 at 15:19
  • +1 for Dominic: I have seen sooooooo many bugs in Java (and PHP and Python and TCL and any other language you want) code that I won't believe any promises stating "this model is less error-prone than that one". You have got to be careful with such statements! – Ta Sas Jul 01 '10 at 17:50
  • 11
    The errors in these languages are a subset of the errors possible in C – Otto Allmendinger Jul 03 '10 at 18:08
  • Perhaps a bit off-topic, but nowadays, SE Linux effectively prohibits buffer overflows from executing. So no more security issue there. – Stefan Steiger Jan 25 '11 at 20:40
  • 1
    @rytis The memory leaks problem in Java & C are different problems. In Java, you have memory leak because you didn't release object reference. In C, it can be that you forget to release object reference OR your pointer point to wrong address. That's double up the risk, compared to Java – janetsmith Nov 06 '12 at 19:51