Java collections faster than c++ containers?

Question

I was reading the comments on this answer and I saw this quote.

Object instantiation and object-oriented features are blazing fast to use (faster than C++ in many cases) because they're designed in from the beginning. and Collections are fast. Standard Java beats standard C/C++ in this area, even for most optimized C code.

One user (with really high rep I might add) boldly defended this claim, stating that

heap allocation in java is better than C++'s
and added this statement defending the collections in java

And Java collections are fast compared to C++ collections due largely to the different memory subsystem.

So my question is can any of this really be true, and if so why is java's heap allocation so much faster.

Did the user provide any evidence to this claim? I can imagine it would be true in some cases, false in others. — juanchopanza, Aug 16 '13 at 07:36
It is pretty hard to beat native code in speed, for some things this might be true — bash.d, Aug 16 '13 at 07:37
@juanchopanza it didn't seem like it to me, you can read the comments yourself, the only reason I am even remotely inclined to believe is that the JVM has more incentive to have fast heap allocation — aaronman, Aug 16 '13 at 07:38
@Rapptz well there was defense on both sides, but even if I'm on c++'s side the person arguing for java seemed to have a better put together argument — aaronman, Aug 16 '13 at 07:40
@Robadob the only reason I wouldn't write benchmarks for this is because of how unspecific it is — aaronman, Aug 16 '13 at 07:41
"due to the different memory subsystem" is not "a better put together argument". It doesn't explain anything. He is right about memory allocations, they *are* typically faster in managed languages, because they're effectively implemented as a single stack push operation, moving a single pointer (at the cost of having occasional GC pauses). But the "collections are faster because *handwaving*" one is nonsense — jalf, Aug 16 '13 at 07:41
@aaronman But to benchmark insert/random-access/deletion operations on a number of different types of collections covers most of the bases and is fairly simple to do. — Robadob, Aug 16 '13 at 07:43
One aspect that is usually ignored is what happens when the garbage collector starts collecting all the garbage that has been produced by design. This can be a killer in some situations, and may not matter at all in others. But it is not something that can be made to disappear. — juanchopanza, Aug 16 '13 at 07:43
@jalf I wasn't referring to that part, I really have a hard time believing the collections are faster — aaronman, Aug 16 '13 at 07:44
@aaronman well, they aren't. But he hasn't given a *reason* why we should believe his claim either, so why should we think they are? — jalf, Aug 16 '13 at 07:46
Here's an interesting article on the [garbage collection thing](http://sealedabstract.com/rants/why-mobile-web-apps-are-slow/), in domains where it does matter. It isn't specifically about Java, but it is worth reading. — juanchopanza, Aug 16 '13 at 07:48
@jalf believe me I'm the first person to defend C++ over java, but when there is an ***accepted*** answer with 159 UV's on SO I usually assume it is correct, take a look at the answer it has some pretty bold claims as well, like the part about copying arrays — aaronman, Aug 16 '13 at 07:51
@jalf this is getting to more of a meta issue, but if an answer like that contains incorrect info it should be removed, right? — aaronman, Aug 16 '13 at 08:03
@aaronman SO doesn't remove answers for being incorrect; it relies on the community to downvote or comment on incorrect answers. Unfortunately, misinformation, especially on hot topics like Java vs C++, spreads wildly. With a topic like that, people will upvote/accept/defend the answer they *want* to believe. The best you can do is keep your own personal information filter in good working condition. The highest voted answer on this very question, for example, is the one with the least backed claims and most unrealistic view (the true answer, like others have stated, is: it depends). — Jason C, Aug 16 '13 at 08:42
@juanchopanza If you're writing a benchmark, and you favor garbage collection, all you have to do is to ensure that it finishes before having triggered the garbage collector. — James Kanze, Aug 16 '13 at 09:13
@aaronman That doesn't correspond to what I've seen. There seems to be very little correlation between the number of up votes and whether the answer is correct or not. — James Kanze, Aug 16 '13 at 09:14
Er... I (hopefully) take back the last sentence of my previous comment. — Jason C, Aug 16 '13 at 10:51
Sigh. Java is the only language with a marketing department, and they work overtime trashing C++. Fifteen years ago, just about every article about Java started with a cheap shot at C++ (my favorite was the assertion that C++ couldn't have garbage collection because it didn't run in a virtual machine; so many things wrong there that you just shake your head in despair); you had to read the **second** paragraph to find out what the article was about. — Pete Becker, Aug 16 '13 at 12:09
For anyone trying to close this, no one has actually told me why they want to close it, I am perfectly willing to reword it so it fits the specifications better — aaronman, Aug 16 '13 at 18:29
@aaronman The reasons for closure are given in the message below. It is unlikely this question can be reworded to fit within the guidelines outlined in the help center. However, it is a very appropriate topic for http://programmers.stackexchange.com, which exists precisely for this type of question. — Jason C, Aug 17 '13 at 15:11
@JasonC I posted it on programmers, but I still think it is a legit question even here, I just want to say obviously the reason I initially posted it here is because this site is more popular and I am active on it — aaronman, Aug 18 '13 at 05:44

score 45 · Accepted Answer · edited Aug 20 '13 at 14:24

This sort of statement is ridiculous; people making it are either incredibly uninformed, or incredibly dishonest. In particular:

The speed of dynamic memory allocation in the two cases will depend on the pattern of dynamic memory use, as well as the implementation. It is trivial for someone familiar with the algorithms used in both cases to write a benchmark proving which ever one he wanted to be faster. (Thus, for example, programs using large, complex graphs that are build, then torn down and rebuilt, will typically run faster under garbage collection. As will programs that never use enough dynamic memory to trigger the collector. Programs using few, large, long lived allocations will often run faster with manual memory management.)
When comparing the collections, you have to consider what is in the collections. If you're comparing large vectors of double, for example, the difference between Java and C++ will likely be slight, and could go either way. If you're comparing large vectors of Point, where Point is a value class containing two doubles, C++ will probably blow Java out of the water, because it uses pure value semantics (with no additional dynamic allocation), where as Java needs to dynamically allocate each Point (and no dynamic allocation is always faster than even the fastest dynamic allocation). If the Point class in Java is correctly designed to act as a value (and thus immutable, like java.lang.String), then doing a translation on the Point in a vector will require a new allocation for every Point; in C++, you could just assign.
Much depends on the optimizer. In Java, the optimizer works with perfect knowledge of the actual use cases, in this particular run of the program, and perfect knowledge of the actual processor it is running on, in this run. In C++, the optimizer must work with data from a profiling run, which will never correspond exactly to any one run of the program, and the optimizer must (usually) generate code that will run (and run quickly) on a wide variety of processor versions. On the other hand, the C++ optimizer may take significantly more time analysing the different paths (and effective optimization can require a lot of CPU); the Java optimizer has to be fairly quick.
Finally, although not relevant to all applications, C++ can be single threaded. In which case, no locking is needed in the allocator, which is never the case in Java.

With regards to the two numbered points: C++ can use more or less the same algorithms as Java in its heap allocator. I've used C++ programs where the ::operator delete() function was empty, and the memory was garbage collected. (If your application allocates lots of short lived, small objects, such an allocator will probably speed things up.) And as for the second: the really big advantage C++ has is that its memory model doesn't require everything to be dynamically allocated. Even if allocation in Java takes only a tenth of the time it would take in C++ (which could be the case, if you only count the allocation, and not the time needed for the collector sweeps), with large vectors of Point, as above, you're comparing two or three allocations in C++ with millions of allocations in Java.

And finally: "why is Java's heap allocation so much faster?" It isn't, necessarily, if you amortise the time for the collection phases. The time for the allocation itself can be very cheap, because Java (or at least most Java implementations) use a relocating collector, which results in all of the free memory being in a single contiguous block. This is at least partially offset by the time needed in the collector: to get that contiguity, you've got to move data, which means a lot of copying. In most implementations, it also means an additional indirection in the pointers, and a lot of special logic to avoid issues when one thread has the address in a register, or such.

The JVM spec does not require the existence of a JIT compiler; even though it's very common these days. Similarly, C++ does not prohibit the existence of a JIT compiler, although as far as I know none exists. Also, to clarify, the optimizer time comparison you mentioned is one-time at compile-time for C++ (assuming static compilation) and run-time for Java (assuming JIT) -- it's not a direct comparison, and leaves out the initial Java compilation step (which is more analogous to the C++ case) as well. — Jason C, Aug 16 '13 at 10:04
@JasonC The distinction one-time vs recurring cost for the optimization is subsumed in the fact that "the C++ optimizer can take significantly more time". (Although I believe that there have been JVM which cached the results of JIT. In which case, optimization could be a one-time cost there as well.) — James Kanze, Aug 16 '13 at 11:48
I suppose it is significant to consider *whose* time it takes when weighing two languages against each other. One-time initial compilation is developer time, run-time delays are user time; in most cases the goal is to minimize the latter, but in any case the user, at least, is fairly unaware of the developer, and the two are separate beasts. — Jason C, Aug 16 '13 at 11:55
@JasonC That is, again, subsumed in the comment about the time available. It's not just developer vs user: there's the fact that the C++ optimizer runs long before you actually use the program, or need its output, which allows it to take far more time. — James Kanze, Aug 16 '13 at 13:19
I may have misinterpreted. In "the C++ optimizer can take significantly more time", did you mean "can" to emphasize that "it has the luxury of being able to take more time", or that "it sometimes takes more time"? — Jason C, Aug 16 '13 at 13:27
@JasonC The first. The correct word, back when I was learning English, would have been "may", but I've fallen into the widespread habit of using "can" when it should be "may". — James Kanze, Aug 16 '13 at 13:52
Whoops. I read it as the second. One meaningless up-vote, coming right up. — Jason C, Aug 16 '13 at 14:08

Jason C · Answer 2 · 2013-08-16T09:45:53.883

30

Your questions don't have concrete answers. For example, C++ does not define memory management at all. It leaves allocation details up to the library implementation. Therefore, within the bounds of C++, a given platform may have a very slow heap allocation scheme, and Java would certainly be faster if it bypasses that. On another platform, memory allocations may be blazing fast, outperforming Java. As James Kanze pointed out, Java also places very little constraints on memory management (e.g. even the GC algorithm is entirely up to the JVM implementor). Because Java and C++ do not place constraints on memory management, there is no concrete answer to that question. C++ is purposefully open about underlying hardware and kernel functions, and Java is purposefully open about JVM memory management. So the question becomes very fuzzy.

You may find that some operations are faster in Java, and some not. You never know until you try, however:

In practice, the real differences lie in your higher level algorithms and implementations. For all but the most absolutely performance critical applications, the differences in performance of identical data structures in different languages is completely negligible compared to the performance characteristics of the algorithm itself. Concentrate on optimizing your higher level implementations. Only after you have done so, and after you have determined that your performance requirements are not being met, and after you have benchmarked and found (unlikely) that your bottleneck is in container implementations, should you start to think of things like this.

In general, as soon as you find yourself thinking or reading about C++ vs. Java issues, stop and refocus on something productive.

edited Aug 16 '13 at 09:45

answered Aug 16 '13 at 07:43

Jason C

38,729
14
126
182

8

Except I'm not designing a performance critical application, I'm just curious – aaronman Aug 16 '13 at 07:43
Just to help you with giving an answer, I'm not deciding between the two langs (big C++ fan), I just want to know why these statements are or aren't true – aaronman Aug 16 '13 at 07:46
Try it! Performance will vary based on many different things; looking at the memory models alone is not enough to determine if one is "faster" than the other. Your data access patterns can also affect performance, things like caching, spanning page boundaries, alignment issues, and other low level characteristics can make it extremely difficult to compare these kinds of things without context. You can tweak Java GC parameters to tune them to your applications needs, that's another potential outside force in determining performance. – Jason C Aug 16 '13 at 07:46
I didn't ask if java is faster than c++, I asked two very specific questions about their speeds – aaronman Aug 16 '13 at 07:47
I know. :) My answer is that it's more complicated and nondeterministic than your questions imply. There's no straight answer, particularly to your second question regarding memory model differences. – Jason C Aug 16 '13 at 07:56
For example, C++ does not define a memory model at all. It leaves allocation details up to the library implementation. Therefore, within the bounds of C++, a given platform may have a very slow heap allocation scheme, and Java would certainly be faster as it bypasses that. On another platform, memory allocations may be blazing fast, outperforming Java. Because C++ does not place constraints on memory models like Java does, there is no concrete answer to that question. C++ is purposefully open about underlying hardware and kernel functions. – Jason C Aug 16 '13 at 07:59
Fair enough, but my guess is that clang, gcc and other popular compilers do heap and stack allocation similarly – aaronman Aug 16 '13 at 08:01
@aaronman On Windows and Linux malloc()/free() ultimately make calls to the kernel, and the memory allocation is implemented there (and the kernel... who knows, there might even be hardware-based allocation support on some systems). This way every application in the system uses the same physical memory model, otherwise memory management would be difficult, if not impossible. But C++ does not prevent, say, a runtime implementation from using a Java-style preallocated block of memory. It would be weird, but it is within the specifications of C++. – Jason C Aug 16 '13 at 08:10
@aaronman And your guess is true in the majority of cases, by the way. :) – Jason C Aug 16 '13 at 08:11
4

ehh your points are valid I just don't like ur answer, I'm just happy I basically asked the question _is java faster than c++_ and got away with it – aaronman Aug 16 '13 at 08:14
1

@JasonC Both C++ and Java partially specify a number of constraints on the memory model, while leaving a number of details up to the implementation. And while the constraints are very different, I don't think that one is any more constraining than the other. – James Kanze Aug 16 '13 at 09:10
@JamesKanze Thanks. I should have said "memory management", the memory model constraints all involve observable behavior (at least in Java). To be specific here, "memory management" refers to JVM spec constraints (which, as you point out, are very minimal), while "memory model" normally refers to JSR 133 (and I have been inappropriately using the term "memory model"; hopefully it hasn't been too misleading). The point I had hoped to make was not that one is more constraining than the other, but that the lack of constraints makes the original question difficult to answer. – Jason C Aug 16 '13 at 09:39
@aaronman Please keep discussions about closing/reopening the question to the question comments in the future. Thanks! :) – Jason C Aug 17 '13 at 07:39

kriomant · Answer 3 · 2013-08-16T07:46:02.240

16

Java heap is faster because (simplified) all you need to do to allocate is to increase heap top pointer (just like on stack). It is possible because heap is periodically compacted. So your price for speed is:

Periodic GC pauses for heap compacting
Increased memory usage

There is no free cheese... So while collection operations may be fast, it is amortized by overall slowing down during GC work.

edited Aug 16 '13 at 07:46

answered Aug 16 '13 at 07:40

kriomant

2,216
1
16
21

2

But to answer the other half, are java collections actually faster because of that? – aaronman Aug 16 '13 at 07:41
1

All you need *to do* to allocate. – Martijn Courteaux Aug 16 '13 at 07:42
It depends. No one can answer but tests. – kriomant Aug 16 '13 at 07:46
1

Prolly gonna accept ur answer but I'll wait till tmrw, maybe I'll realize that I was insane for even asking this question – aaronman Aug 16 '13 at 08:22
1

@aaronman There could also cache behaviour considerations which could matter in some domains. Naively, one would expect an array of java objects to have worse behaviour due to non-locality, in a similar way to how a `vector` could have worse behaviour than a `vector`. – juanchopanza Aug 16 '13 at 08:54
Note that C++ certainly allows the same memory allocation scheme, the problem is that without compaction the heap becomes really fragmented. – MSalters Aug 16 '13 at 08:57
@juanchopanza agreed, but I would guess java handles it a little differently or the amount of indirection would destroy the cache performance, I mean the only way that you don't have a reference is with primitives and you can only go so far with those since you can't even use them in java collections – aaronman Aug 16 '13 at 08:58
@aaronman The extra indirection should have a cost, yes. On the other hand, immediately after the garbage collector has run, locality should be better. (In the end, you can't make any generalizations. You just have to write the application, and measure under real-life conditions.) – James Kanze Aug 16 '13 at 11:50
1

Jason C's answer is much more complete. This answer makes the assumptions that 1) Java has a defined memory model, and 2) C++ has a defined memory model. Neither assumption is correct (the memory model for both is left to the implementation - and will differ with different JVMs or libraries, respectively). Additionally, it isn't a fair comparison because you'd be comparing the allocation+usage model (Java) to allocation+usage+deallocation model (C++). For a true comparison, you would have to include the GC's time to clean up the memory. – Zac Howland Aug 16 '13 at 14:13

score 12 · Answer 4 · answered Aug 16 '13 at 07:45

12

While I am a fan of Java, it is worth noting that C++ supports allocation of objects on the stack which is faster than heap allocation.

If you use C++ efficiently with all it various ways of doing the same thing, it will be faster than Java (even if it takes you longer to find that optimal combination)

If you program in C++ as you would in Java, e.g. everything on the heap, all methods virtual, have lots of runtime checks which don't do anything and can be optimised away dynamically, it will be slower. Java has optimised these things further as these a) are the only thing Java does, b) can be optimised dynamically more efficiently, c) Java has less features and side effects so it is easier for optimiser for get decent speeds.

answered Aug 16 '13 at 07:45

Peter Lawrey

525,659
79
751
1,130

2

if you only use primitives everything in java will be on the stack – aaronman Aug 16 '13 at 07:51
3

Adding to this: All STL containers let you supply a custom allocator, so if you wanted, you could supply an allocator based on e.g. alloca() to keep everything on the stack, or you could preallocate a block on the heap and supply a custom allocator with your own memory model. – Jason C Aug 16 '13 at 07:53
1

@aaronman True, primitive and references are on the stack. In theory Java can place objects on the stack as well, but in practice this rarely works. In future escape analysis may solve this properly. – Peter Lawrey Aug 16 '13 at 07:54
@aaronman That's true but it also sidesteps your original question. :) – Jason C Aug 16 '13 at 07:54
@JasonC C++ also supports concurrent collectors. It's the built in collector which tends to suffer in multi-threaded environments. Java's allocation is multi-threaded by default, though you can force it to be single threaded. Stack allocation is always multi-threaded. – Peter Lawrey Aug 16 '13 at 07:56
2

@PeterLawrey Excellent point. Also, it's worth noting that C++ doesn't actually define a memory model, the details are left to the implementors; so conceivably one platform could have terrible heap allocation performance, another could be blazing fast, another could be optimized for concurrent access, etc. This renders questions about C++ vs. Java memory model performance very difficult to answer -- a more concrete question is "C++ on platform X with runtime implementation Y vs. Java". – Jason C Aug 16 '13 at 08:05
@aaronman There are libraries for C++ to do this so I wouldn't suggest you do this yourself. Also Java allocation is faster because it never has to reuse/find free any memory or de-fragment it in the current thread. This is left as a task for other threads. In the case of Azul's Zing, this is done entirely concurrently. – Peter Lawrey Aug 16 '13 at 08:06
3

Trust me, wasn't planning to – aaronman Aug 16 '13 at 08:07
@Peter: The "never has to reuse/find memory" bit isn't entirely accurate. When the GC compacts the Java heap, it does exactly that (and de-frags your memory). The only thing it buys you is the knowledge that you won't have to scan memory to find a block of a suitable size during allocation (but you pay for that when the GC decides it wants to clean up). Half-dozen in one hand, 6 in the other ... – Zac Howland Aug 16 '13 at 14:20
@ZacHowland You have a good point. It is only object which are retained which need to be scanned this way. If most objects die, you skip most of this work. – Peter Lawrey Aug 16 '13 at 14:27

score 1 · Answer 5 · answered Aug 19 '13 at 17:47

and Collections are fast. Standard Java beats standard C/C++ in this area, even for most optimized C code.

This may be true for particular collections, but most certainly isn't true for all collections in all usage patterns.

For instance, a java.util.HashMap will outperform a std:map, because the latter is required to be sorted. That is, the fastest Map in the Java Standard Library is faster that the fastest Map in the C++ one (at least prior to C++11, which added the std:unordered_map)

On the other side, a std:Vector<int> is far more efficient that an java.util.ArrayList<Integer> (due to type erasure, you can't use a java.util.ArrayList<int>, and therefore end up with about 4 times the memory consumption, and possibly poorer cache locality, and correspondingly slower iteration).

In short, like most sweeping generalizations, this one doesn't always apply. However, neither would the opposite assertion (that Java is always slower than C++). It really depends on the details, such as how you use the collection, or even which versions of the languages you compare).

Good answer but its silly to compare std::map and hashmap, because they are different data structures, map is generally represented by a red black tree — aaronman, Aug 19 '13 at 19:13
The quote in the questions compares "Java Collections" with "C++ collections". I read this to mean the collection implementations from the respective standard library. Not my fault that the C++ standard library uses inferior data structures ;-) — meriton, Aug 19 '13 at 19:23
As you already know c++11 has unordered_map, and it is an unfair comparison because log n is slower than constant time in any language — aaronman, Aug 19 '13 at 19:42

Java collections faster than c++ containers?

5 Answers5

Linked