Performance issue: Java vs C++

Question

I have always heard that C++ was way more efficient than Java (and that is why most games are developed in C++).

I wrote a small algorithm to solve the "Eight queens puzzle" in both Java and C++, using the exact same algorithm, and then started to raise the number or squares. When reaching checkboards of 20*20 or even 22*22, it appears Java is much more effective (3 seconds vs 66 seconds for C++).

I have no idea why, but I am pretty beginning with C++, so it is possible I made some huge performance mistakes, so I will gladly accept any information that would help me understand what is happening.

Below is the code I use in Java:

import java.awt.Point;
import java.util.ArrayList;
import java.util.List;

public class HuitDames {

    /**
     * La liste des coordnnées des dames.
     */
    private static List<Point> positions = new ArrayList<>();

    /**
     * Largeur de la grille.
     */
    private static final int LARGEUR_GRILLE = 22;


    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        int i = 1;
        placerDame(i);
        for (Point point : positions) {
            System.out.println("(" + point.x + "; " + point.y + ")");
        }
    }

    /**
     * Place une dame et return true si la position est bonne.
     * @param i le numéro de la dame.
     * @return si la position est bonne.
     */
    private static boolean placerDame(int i) {

        boolean bonnePosition = false;
        for (int j = 1; j <= LARGEUR_GRILLE && bonnePosition == false; j++) {
            Point emplacement = new Point(i, j);
            positions.add(emplacement);
            if (verifierPrise(emplacement) && (i == LARGEUR_GRILLE || placerDame(i + 1))) {
                bonnePosition = true;
            }
            else {
                positions.remove(i - 1);
            }
        }

        return bonnePosition;
    }

    /**
     * Vérifie que la nouvelle position n'est pas en prise avec une position déjà présente.
     * @param position la position de la nouvelle dame.
     * @return Si la position convient par rapport aux positions des autres dames.
     */
    private static boolean verifierPrise(Point position) {
        boolean nonPrise = true;
        for (Point point : positions) {
            if (!point.equals(position)) {
                // Cas où sur la même colonne.
                if (position.y == point.y) {
                    nonPrise = false;
                }
                // Cas où sur même diagonale.
                if (Math.abs(position.y - point.y) == Math.abs(position.x - point.x)) {
                    nonPrise = false;
                }
            }
        }

        return nonPrise;
    }
}

And below is the code in C++:

#include <iostream>
#include <list>
#include <math.h>
#include <stdlib.h>

using namespace std;


// Class to represent points.
class Point {

    private:
        double xval, yval;

    public:
        // Constructor uses default arguments to allow calling with zero, one,
        // or two values.
        Point(double x = 0.0, double y = 0.0) {
                xval = x;
                yval = y;
        }

        // Extractors.
        double x() { return xval; }
        double y() { return yval; }
};

#define LARGEUR_GRILLE 22
list<Point> positions;


bool verifierNonPrise(Point emplacement) {
    bool nonPrise = true;
    for (list<Point>::iterator it = positions.begin(); it!= positions.end(); it++) {
        if (it->x() != emplacement.x()) {
            if (it->y() == emplacement.y()) {
                nonPrise = false;
            }
            if (abs(it->y() - emplacement.y()) == abs(it->x() - emplacement.x())) {
                nonPrise = false;
            }
        }
    }

    return nonPrise;
}

bool placerDame(int i) {
    bool bonnePosition = false;
    for (int j = 1; j <= LARGEUR_GRILLE && !bonnePosition; j++) {
        Point emplacement(i,j);
        positions.push_back(emplacement);
        if (verifierNonPrise(emplacement) && (i == LARGEUR_GRILLE || placerDame(i + 1))) {
            bonnePosition = true;
        }
        else {
            positions.pop_back();
        }
    }

    return bonnePosition;
}

int main()
{
    int i = 1;
    placerDame(i);
    for (list<Point>::iterator it = positions.begin(); it!= positions.end(); it++) {
        cout << "(" << it->x() << "; " << it->y() << ")" << endl;
    }
    return 0;
}

Did you compile your C++-Code with optimization enabled? For gcc and clang add `-O3` to the command line. — Baum mit Augen, Jul 23 '14 at 15:24
use a vector and not a list...Iterating through a list in any language is horrible. — Benjamin Trent, Jul 23 '14 at 15:24
First, use `std::vector` instead of `std::list`. Then, try passing by reference rather than value. For example, `bool verifierNonPrise(const Point& emplacement)`. Next, don't create named temporaries, but rather do `positions.emplace_back(i, j);`. It is likely there are many other things you can improve. — juanchopanza, Jul 23 '14 at 15:25
"I have always heard that C++ was way more efficient than Java"-because generally it is. — Benjamin Trent, Jul 23 '14 at 15:26
It is not at all unusual for Java to beat C++. A JITC-optimized Java program can be more tightly optimized than is possible using any readily available C++ compiler/linker. Eg, a JITC can easily inline operations on a Point, while C++ can usually only do so if the functions are declared inlineable. — Hot Licks, Jul 23 '14 at 15:27
As the language gets closer to machine language (the more lower level a language is) you have more knobs to control stuff and this can either make or break your code depending on how you use these knobs. If you look up all the comparison cases here is SO you'll find many examples of how tweaking a C++ code can make a code that was originally slow very fast. Most of the time you can't do that in higher level languages. This doesn't mean all C++ codes are going to be faster than JAVA, PYTHON ... it just means you can make your code faster. — triple_r, Jul 23 '14 at 15:29
Just a minor point, which will probably not affect performance very much. If all your point class does is hold `x` and `y`, and there are no invariants to maintain, you might as well remove the member functions and make the data members public. — juanchopanza, Jul 23 '14 at 15:30
@HotLicks Note that declaring the functions inlinable really has no bearance on whether they are inlined. In this example, `Point`'s methods are implicitly inline. But they aren't even necessary. The data should be public. — juanchopanza, Jul 23 '14 at 15:33
@juanchopanza - In Java any (non-native) function can be inlined if the JITC decides it's worthwhile. In C++, if a function is in another compilation unit, it can only be inlined if the linker is very, very smart or if the compiler effectively recompiles everything. — Hot Licks, Jul 23 '14 at 15:36
@juanchopanza - And a corollary to that is that in JITC-optimized Java getter and setter functions are "free" -- zero cost vs accessing the variables directly. — Hot Licks, Jul 23 '14 at 15:38
@triple_r Those knobs are placed in the JVM for Java, not the compiler. Hotspot options can make a _very_ big difference. — Thorbjørn Ravn Andersen, Jul 23 '14 at 15:40
@HotLicks Sure, but OP's code, everything is inline. The performance hit is most likely the use of `std::list`. As for the getters, these would be optimized out too if inlined, but using them could result in extra copies of the data members being made, because semantically they return values. Anyway, it is hard to see how Java could out-perform C++ in such a simple, numeric problem, if coded correctly in both languages. I wouldn't expect C++ to massively out-perform Java either. — juanchopanza, Jul 23 '14 at 15:43
@ThorbjørnRavnAndersen, I didn't mean other languages don't have those controls, I meant as you get closer to the machine language you have more of those controls. But you are absolutely right. — triple_r, Jul 23 '14 at 15:47
@HotLicks And such smart linkers exist. Any good compiler today will have options for inter-module optimization. The big differences come from other things: for some types of applications, for example, garbage collection will be a big win; for others, having to allocate everything dynamically will kill you. (I would expect his to be one of the latter; he has no dynamic allocations what so ever outside of his list. But the fact that every push_back and pop_back must allocate/free, and the lack of locality are killers.) — James Kanze, Jul 23 '14 at 15:49
@juanchopanza If coded correctly _and_ compiled with optimization. With Java, you don't worry about the optimziation; the JIT decides when and where. With C++, you have to tell the compiler to optimize. (And while I wouldn't expect a massive difference, the fact that he has to dynamically allocate each and every `Point` in Java could make a significant difference. I don't know whether the latest JIT would be capable of eliminating that.) — James Kanze, Jul 23 '14 at 15:54
@JamesKanze Plus the JIT actually has to run. And the garbage collector too. BTW I forgot that the user defined types had to be allocated dynamically in java. I was thinking of array-based problems. I would also expect that to have an effect. — juanchopanza, Jul 23 '14 at 15:58
@juanchopanza Running the JIT can be an issue, particularly for a small program like his. (Try running his example a million times, and you can ignore the JIT.) — James Kanze, Jul 23 '14 at 16:26
@juanchopanza And depending on the code, the garbage collector can be faster than manual allocation and the necessary frees. In his code, it's almost certain that the total cost of an allocation in Java will be cheaper than in C++, because he never uses enough memory to trigger the collector. And even if you repeat a million times, this would be true, because you're in the optimal case for GC. But even the fastest allocation is slower than no dynamic allocation. — James Kanze, Jul 23 '14 at 16:26
Whenever I see `push_back` I cringe. There are so many things that can make any program slow, other than the choice of language. Some people who are "green around the gills" think that optimization is all about compiler optimization, [*when in fact that should be the last thing you consider.*](http://stackoverflow.com/a/927773/23771) — Mike Dunlavey, Jul 23 '14 at 21:48
Something to keep note of, is that your choice of a compiler has a real effect on performance. In some benchmarks, C code compiled with gcc will not run any faster than Java compiled with a good compiler. See this http://www.stefankrause.net/wp/?p=6 and this http://www.stefankrause.net/wp/?p=6 for more info — Neowizard, Jul 24 '14 at 09:48
@JamesKanze: C++ code can use a custom allocator (aggregate allocations and do only a single free later.) That's even faster. Or leave the free out an rely on OS cleanup on exit. — Deduplicator, Jul 24 '14 at 12:27
@Deduplicator Maybe. That's more or less how garbage collection works. As for simply leaving the free out; that's exactly what happens when you don't use enough memory to trigger garbage collection (with the added plus that allocation algorithms that count on garbage collection are faster in the allocation). Of course, in C++, you can use a garbage collector too, for the cases where it is the better solution. — James Kanze, Jul 24 '14 at 13:52
@JamesKanze: Actually, with that part I meant: Use an allocator explicitly not supporting deallocation *at all*: Faster and more efficient than normal manual deallocation as well as any garbage collector, even if it never triggers. (That's an optimization for a really small niche with respectable dividends...) — Deduplicator, Jul 24 '14 at 13:58
I'll say you something - if you're not planing to make advenced methods (for example used in artificial intelligence), which should be ultra optimized, and any millisecond is important... or advanced computations in graphic or etc.. It's not needed, to care about "C++ is faster than Java" - yes it is, but today we have powerfull enough computer, to not take care about such performance in every normal programm (ofcourse I'm not saying about tradicional optimalization like makind not needed loops :P ).. Java is powerfull, because you can use java almost everywhere with hudge libraries support ;) — Krystian, Jul 24 '14 at 14:11
@Deduplicator All of the allocators I've seen that don't support deallocation at all work very much like the allocators of a copying garbage collector; the allocation time is very, very rapid. Where garbage collection costs more in such cases is when it is triggered, and if you allocate little enough so that you don't need to deallocate, then garbage collection shouldn't ever trigger, and you end up with executing essentially the same code. — James Kanze, Jul 24 '14 at 14:33
@Krystian It depends. Java is _very_ poor for numerical applications, where you have a lot of very small types with value semantics. The JIT cannot always find all of the cases where it could box, so you end up with additional allocations, etc. which cost significant real time. Of course, the real reason Java never gets used in such cases is that the language isn't expressive enough to allow writing readable code for them. In the end, Java is not designed for large scale programming, and it is very difficult to write readable, reliable code in Java. — James Kanze, Jul 24 '14 at 14:36
@JamesKanze: But the GC needs knowledge of the type and size of each allocation. The C++ solution does not need it at all, thus reducing time for allocating and size of allocations. — Deduplicator, Jul 24 '14 at 14:38
@Deduplicator surely it does need to know that, or at least the size, as when you get scattered blocks of freed memory amongst allocated memory, at some point (when there's no contiguous block big enough for its next allocation) it has to pause (like a gc, although for a different reason) and compact the memory? — Rob Grant, Jul 24 '14 at 15:24
this shows, one needs a license to use C++ (like a license to use a gun) — Display Name, Jul 24 '14 at 17:04
@RobertGrant: Sry, but you are missing the point: If there's no deallocation, there's no fragmentation, no need for keeping track of types, sizes and the like. — Deduplicator, Jul 24 '14 at 19:19
@SargeBorsch: There's no programming licence yet. Anyway, who would be competent, willing and trusted to assess competency of PHP, Java, Perl, C#, C++, [insert programming language of choice] programmers? — Deduplicator, Jul 24 '14 at 19:22
@Deduplicator what I mean is that to free that memory at some point (unless you're using an OS that doesn't manage memory) I would imagine _something_ is automatically keeping track of the size of the allocation and how it's interspersed amongst allocations for other applications (any of which may do deallocation and cause fragmentation etc). — Rob Grant, Jul 25 '14 at 07:25
@RobertGrant: But if you don't wnat to free the memory ever, leaving it to the OS to clean up at process exit (which it has to do anyway), there's no need to keep track of any of the information neccessary to free allocated blocks, thus you just don't. — Deduplicator, Jul 25 '14 at 13:07
@Deduplicator yeah, makes sense. I was thinking it wouldn't have to if you freed it, but if it does it either way then why not. — Rob Grant, Jul 25 '14 at 13:10
I would really suggest you not to use language other than English for variables and function names. Sometimes your code is being read by people who do not understand French (like, I guess, here at stack overflow) which makes it harder for them to read your code. — varepsilon, Jul 30 '14 at 05:53

score 93 · Accepted Answer · answered Jul 23 '14 at 15:24

93

std::list in C++ is a linked list, whereas java.util.ArrayList is an array. Try replacing std::list by std::vector. Also, be sure to compile with optimization turned on.

answered Jul 23 '14 at 15:24

Brian Bi

111,498
10
176
312

I am now getting 6 seconds for C++ after changing to vector and by passing with reference. It is still twice as much as Java but I guess it also depends on the compiler. – realUser404 Jul 23 '14 at 16:50
4

@realUser404 For me, passing by reference is slower in this case (I guess because the `Point` object isn't that big). Also, what compiler are you using? For me your original C++ code is already faster than your original Java code when compiled with optimization (`g++-4.9 -O2`). – Brian Bi Jul 23 '14 at 17:01
10

Replacing double by int in the Point structure may let the compiler generate even better code. – Jakub Jul 23 '14 at 17:42
22

@Jakub Good point. Java code with `int` vs. C++ code with `double` isn't a fair comparison. – nobody Jul 23 '14 at 18:43
I also get a significant boost by replacing the loop in `verifierNonPrise` with `std::find_if` – Martin York Jul 23 '14 at 20:59
4

@Jakub My thoughts exactly. Wrong data structure plus wrong data type. Further, in the interests of helping our friend Peter, I think each language has it's own STL quirks. Becoming fluent requires knowledge of the libraries as well as the idioms. – Sinthia V Jul 24 '14 at 00:15
3

You might want to detail just how the `std::list` structure is less efficient for this program than `std::vector`. Is it just a matter that the basic operations (for instance increment of an iterator) on a (doubly linked) list require more cycles than for a vector, or are there things used that are fundamentally slower for lists (like element access by index)? In other words, do the two implementations have the same asymptotic complexity with just a constant factor between them, or not? – Marc van Leeuwen Jul 24 '14 at 13:16
@MarcvanLeeuwen In this case, just the contiguous nature of the `vector` may have an impact (more cache-friendly). – ssube Jul 24 '14 at 14:24
@ssube: Sure, but that just enters into the constant factor stuff. I'm not saying using lists is a great idea, but it might just account for a constant factor in speed. – Marc van Leeuwen Jul 24 '14 at 17:43
Actually, the faster is to use Point* rather than any STL container (it's easy to do here because we know the size and it's fixed). See my code proposed in my answer below. – jpo38 Aug 01 '14 at 18:36
1

@jpo38 There is absolutely no reason at all why a vector of Point objects should be any slower than a "naked" array of Point objects. – antred Sep 23 '14 at 19:14

Martin York · Answer 2 · 2014-07-30T06:30:38.650

89

Updates:

Changes to C++

As written:
Compilation Failure
Replace math.h => cmath
27610 milliseconds
Add -O3 flag
4416 milliseconds
Replace std::list with std::vector
2294 milliseconds
Replace Point with std::pair
2384 milliseconds
Made verifierNonPrise const correct
2351 milliseconds
Replaced loop in verifierNonPrise with std::find_if
929 milliseconds
Replacing double with int (to make it equiv to Java)
549 milliseconds

Changes to Java

As written
3459 milliseconds
Changes verifierNonPrise early return
368 milliseconds

Java Vs C++

> javac HuitDames.java
> time java HuitDames
real    0m0.368s
user    0m0.436s
sys     0m0.042s    
> g++ -O3 -std=c++11 HuitDames.cpp
> time ./a.out
real    0m0.541s
user    0m0.539s
sys     0m0.002s

Final Code:

#include <iostream>
#include <vector>
#include <cmath>
#include <stdlib.h>
#include <chrono>
#include <algorithm>

using namespace std;

typedef std::pair<int, int>   Point;

#define LARGEUR_GRILLE 22
vector<Point> positions;


bool verifierNonPrise(Point const& emplacement) {
    return std::find_if(positions.begin(), positions.end(), [&emplacement](Point const& val){
        if (val.first != emplacement.first) {
            if ((val.second == emplacement.second) || (abs(val.second - emplacement.second) == abs(val.first - emplacement.first))) {
                return true;
            }
        }
        return false;
    }) == positions.end();
}

bool placerDame(int i) {
    bool bonnePosition = false;

    for (int j = 1; j <= LARGEUR_GRILLE && !bonnePosition; j++) {
        Point emplacement(i,j);
        positions.push_back(emplacement);
        if (verifierNonPrise(emplacement) && (i == LARGEUR_GRILLE || placerDame(i + 1))) {
            bonnePosition = true;
        }
        else {
            positions.pop_back();
        }
    }

    return bonnePosition;
}


int main()
{
    using std::chrono::system_clock;

    system_clock::time_point begin_time = system_clock::now();

    int i = 1;
    placerDame(i);
    for (vector<Point>::iterator it = positions.begin(); it!= positions.end(); it++) {
        cout << "(" << it->first << "; " << it->second << ")" << endl;
    }

    system_clock::time_point end_time = system_clock::now();

    long long elapsed_milliseconds = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - begin_time).count();
    cout << "Duration (milliseconds): "
         << elapsed_milliseconds
         << std::endl;    
}

edited Jul 30 '14 at 06:30

answered Jul 23 '14 at 20:40

Martin York

257,169
86
333
562

`#include ` and `#include ` are missing (didn't compile on my compiler without them). Also, -O3 and -Ofast change nothing for timings. Any thoughts why? – Florian Richoux Jul 24 '14 at 01:30
Are the times for `Replace std::list with std::vector`, `Replace Point with std::pair` and `Made verifierNonPrise const correct` really different or are the differences just measurement imprecisions? – nwp Jul 24 '14 at 09:19
@FlorianRichoux: -Ofast enables optimizations that allow non-standard behaviour, but still behave reasonable in most cases (e.g. optimize a*a*a*a to (a*a)*(a*a). There is simply no occasion here where this could lead to measurable improvement. – PlasmaHH Jul 24 '14 at 13:47
The result is just as I expected: Java and C++ have comparable performance, if startup time is not as important. Note that you are comparing to the unoptimized Java version where obvious bugs have not been fixed, such as the flawed loop in `verifierNonPrise`. – Axel Jul 24 '14 at 14:35
Consider explicitly calling out where you change the data structures and data types to match the Java version, thus actually making the cases comparable at all. – Deduplicator Jul 24 '14 at 14:41
@nwp. Those are real measurements (in milliseconds) over one run. So there will be some variability in the numbers. – Martin York Jul 24 '14 at 16:20
@Deduplicator: I though I did. Replace `std::list` with `std::vector`. Replaced `Point` with `std::pair` then replaced all `double` with `int`. – Martin York Jul 24 '14 at 16:31
@nwp: I would consider the only real major enhancements to be: 1) Turn on optimization. 2) Use std::vector over std::list 3) Use int rather than double 4) Use standard algorithms rather than hand written loops. – Martin York Jul 24 '14 at 16:33
@Axel: Added a fix to the Java versions so `verifierNonPrise()` returns asap. Now times are comparable. – Martin York Jul 24 '14 at 18:50
What if you remove `` and use `printf` instead? That should considerably speed things up. – Deduplicator Jul 24 '14 at 19:25
2

@Deduplicator: The difference in speed between the two is (mostly) a myth. Plenty of articles on SO about that. When you see a difference it is because you forgot to call [sync_with_stdio](http://www.cplusplus.com/reference/ios/ios_base/sync_with_stdio/) – Martin York Jul 24 '14 at 19:40
1

@Deduplicator: The amount of output is so small I don't expect any measurable difference in speed. Just tried it (no difference). – Martin York Jul 24 '14 at 20:23
Some minor improvement can be obtained by only doing the `push_back` when no overlap is found for the new one, by simplifying the logic (removing needless named variables, removing conditional by noting that the rows are evaluated in increasing order) and removing the `abs`, and by doing the `std::find_if` over one fewer element: http://coliru.stacked-crooked.com/a/2915144fc0c2a75e On Coliru, this reduces runtime from 1511ms to 1130ms (-25%). Of course, YMMV and Java might also benefit from these changes. – TemplateRex Aug 05 '14 at 14:18
`cout << "(" << it->first << "; " << it->second << ")" << endl;` - shouldn't be flushing the stream after every line - `endl` should be replaced with `'\n'`. Note that the Java code's `System.out.println("(" + point.x + "; " + point.y + ")");` does not flush as there's no newline. – Tony Delroy Nov 27 '14 at 06:01
@LokiAstari: hadn't a good compiler handy, so I only ran it on ideone... came down from about 1800ms to 1700ms but not sure how reproducible, the hardware, optimisation settings, and that was with `inline` prepended to `verifierNonPrise` too, though it was hopefully being inlined anyway, and a cleanup to the lambda to return "X & Y" instead of the `if (X) if (Y) return true; return false;` dance... also probably irrelevant. – Tony Delroy Nov 27 '14 at 07:30

NetVipeC · Answer 3 · 2014-07-23T22:22:44.557

Test this version, updated using C++11 features. Tested in GCC 4.9.0 with -std=c++11. Tested on Celeron 1.6 GHz, 512 MB RAM.

Times in my PC:
Original: Duration (milliseconds): 12658
First Version: Duration (milliseconds): 3616
Optimized Version: Duration (milliseconds): 1745

Changes are:

Using vector instead of list Benchmark, and Words from Stroustrup.
Using const whatever we can, the compiler is able to optimize much more if it known that the value don't change.
Using std::pair instead of Point.
Using new for-loop with constant iterators.

Source:

#include <iostream>
#include <vector>
#include <chrono>
#include <iomanip>

using namespace std;

typedef std::pair<int, int> Point;

#define LARGEUR_GRILLE 22
vector<Point> positions;

bool verifierNonPrise(const Point& emplacement) {
    bool nonPrise = true;
    for (const auto& p : positions) {
        if (p.first != emplacement.first) {
            if (p.second == emplacement.second) {
                nonPrise = false;
            }
            if (abs(p.second - emplacement.second) ==
                abs(p.first - emplacement.first)) {
                nonPrise = false;
            }
        }
    }

    return nonPrise;
}

bool placerDame(int i) {
    bool bonnePosition = false;
    for (int j = 1; j <= LARGEUR_GRILLE && !bonnePosition; j++) {
        Point emplacement(i, j);
        positions.emplace_back(emplacement);
        if (verifierNonPrise(emplacement) &&
            (i == LARGEUR_GRILLE || placerDame(i + 1))) {
            bonnePosition = true;
        } else {
            positions.pop_back();
        }
    }

    return bonnePosition;
}

int main(int argc, char* argv[]) {
    std::chrono::system_clock::time_point begin_time =
        std::chrono::system_clock::now();

    positions.reserve(LARGEUR_GRILLE);

    placerDame(1);
    for (const auto& p : positions) {
        cout << "(" << p.first << "; " << p.second << ")" << endl;
    }

    std::chrono::system_clock::time_point end_time =
        std::chrono::system_clock::now();
    long long elapsed_milliseconds =
        std::chrono::duration_cast<std::chrono::milliseconds>(
            end_time - begin_time).count();
    std::cout << "Duration (milliseconds): " << elapsed_milliseconds
              << std::endl;

    return 0;
}

Some more deep changes.

Changes include:

Returning as early as possible. As soon as the queen can not be placed.
Returning to a simpler Point class.
Using find_if algorithm for searching queen placement.

Source (some recommendation updated):

#include <algorithm>
#include <iostream>
#include <vector>
#include <chrono>
#include <iomanip>

using namespace std;

struct Point {
    int x, y;
};

#define LARGEUR_GRILLE 22
vector<Point> positions;

bool verifierNonPrise(const Point& emplacement) {
    return find_if(positions.cbegin(), positions.cend(), [&emplacement](const Point& p) {
               return (p.x != emplacement.x &&
                       (p.y == emplacement.y ||
                        abs(p.y - emplacement.y) == abs(p.x - emplacement.x)));
           }) == positions.cend();
}

bool placerDame(int i) {
    for (int j = 1; j <= LARGEUR_GRILLE; j++) {
        Point emplacement{i, j};
        positions.push_back(emplacement);
        if (verifierNonPrise(emplacement) &&
            (i == LARGEUR_GRILLE || placerDame(i + 1))) {
            return true;
        } else {
            positions.pop_back();
        }
    }
    return false;
}

int main(int argc, char* argv[]) {
    std::chrono::system_clock::time_point begin_time =
        std::chrono::system_clock::now();

    positions.reserve(LARGEUR_GRILLE);

    placerDame(1);
    for (const auto& p : positions) {
        cout << "(" << p.x << "; " << p.y << ")" << endl;
    }

    std::chrono::system_clock::time_point end_time =
        std::chrono::system_clock::now();
    long long elapsed_milliseconds =
        std::chrono::duration_cast<std::chrono::milliseconds>(
            end_time - begin_time).count();
    std::cout << "Duration (milliseconds): " << elapsed_milliseconds
              << std::endl;

    return 0;
}

Not sure why `std::pair`? From a design point of view, it is definitely something you would want to avoid. I also doubt that using `emplace_back` makes a significant difference; it's more fine tuning. (Using `std::vector` and pass by reference are doubtlessly the big wins.) — James Kanze, Jul 23 '14 at 15:50
On my machine Original 4296 ms. NetVipeC version 682. Probably some more improvements to be made but already a vast improvement — Benjamin Trent, Jul 23 '14 at 15:52
@JamesKanze yes in this case `emplace_back` don't gain nothing, i changed because think that `emplacement` variable in method `placerDame` was used only for inserting in the `vector` and undo when see the calling to `verifierNonPrise`, but forgot `emplace_back`. The reason for `std::pair` was that there is no reason to create a class for something that STD cover, is only more code to maintain, more code to test, even if is so trivial, and `std::make_pair` (in some compilers do only one allocation for the 2 types), in this case is not much, but creating 1 Million pairs, would be a difference. — NetVipeC, Jul 23 '14 at 16:00
@NetVipeC No compiler should do any allocation for anything outside of `std::vector` here. It would be much cleaner, and no slower, to use a class. Or since you're using C++11, a struct with list initialization. And of course, `std::pair` is obfuscation here, since the members are _not_ first and second, but `x` and `y` (which has a clear semantic meaning); `std::pair` is convenient for quick experiments, but its use is an anti-pattern in serious code. — James Kanze, Jul 23 '14 at 16:31
I might add that your uses of `const` don't help the compiler to optimize, since the compiler cannot assume that the object isn't modified through another path. (On the other hand, they do help readability.) — James Kanze, Jul 23 '14 at 16:32
After changing to `std::list -> std::vector` and `Point -> std::pair` the best improvement I got was to use `std::find_if` instead of an explicit loop. — Martin York, Jul 23 '14 at 20:24
The fact is that it is not points that should be stored in the vector. Currently, it is always true that `positions[i].x == i`. This means that storing the x coordinate is superfluous. You may leave only the y (and use `std::vector`). — ach, Jul 24 '14 at 12:56
You are right, probably there will be no much performance gain, at least in this specific case, but it is better not to storage redundant info. — NetVipeC, Jul 24 '14 at 13:01

score 19 · Answer 4 · edited Jul 23 '14 at 21:30

19

Comparing a managed, dynamically compiled language like Java to a statically compiled language like C++ is very difficult.

You will always be comparing apples to oranges in a sense, because they are conceptually very different. It starts with the use of the standard libraries (ArrayList vs std::list/vector) that will have potentially wildly different performance characteristics, even your code looks similar in both languages.

Then there is the inherent problem with microbenchmarks in Java (short test in Java are always slower because the JIT will observe program flow before it decides what and how it is to be compiled). Same goes for compiler options for C++, even the structure of the source code (independently compiled and linked classes versus single file source) can make a significant difference (because it changes the amount of "insight" the C++ compiler has into the other classes).

Next is the general difference in memory management, garbage collection vs manual memory management (smart pointers etc. are still considered manual memory management).

Not to mention the general language differences like you need to explicitly declare a method virtual in C++, while in Java every member method is virtual by default (working out if it's really virtual at runtime is left to the VM).

With all those differences there will always be cases where one langauge will have a massive advantage over the other. A simple test with very limited scope (like your test here) says very little about each language as a whole.

Another point people often tend to ignore is: How productive can you be with a language - speed isn't everything (look a how sucessful script langages are in some domains, despite being hardly competive when looking only at excution speed). Lack of performance can be crippling, but so can be low productivity.

edited Jul 23 '14 at 21:30

Peter Mortensen

30,738
21
105
131

answered Jul 23 '14 at 16:02

Durandal

19,919
4
36
70

2

I think the point of the question is that OP would have (rightly) expected C++ to be faster in this particular case, and was surprised to find that it was way slower. There's no apples or oranges here. – juanchopanza Jul 23 '14 at 16:21
7

@juanchopanza Then the title of the quest should have been "How to improve this C++ code" and it would probably be better simply placed on codereview. While other answers cover possible optimizations well, they all neglect the inherently incomparable aspect - and some of the optimizations suggested make the results even more incomparable. The one right thing was to change the C++ version to vector because its more similar to ArrayList. But everthing else is just increasing the difference between the two implementations where it *is* comparing apples to oranges. – Durandal Jul 23 '14 at 16:37
@Durandal - changing point coordinates from double to int makes it more simmilar to the Java example. And there can by a large preformace difference. – Jakub Jul 23 '14 at 17:49
@Jakub While I wholeheartedly agree to your same datatypes make it similar, but my general "apples to oranges" was meant towards optimization attempts like "Returning as early as possible. As soon as the queen can not be placed." where the logic is even fundamentally changed. I did not go trough and check the OP's original code if it was similar in the first place, so I never even noticed that this blunder exists in the original C++ version. I simply *trusted* when he said *exact same algorithm* that there were no such fundamental (and unmotivated) differences. – Durandal Jul 23 '14 at 18:05
@Durandal My point was, your "The one right thing was to change the C++ version to vector because its more similar to ArrayList. But everthing else is just increasing the difference between the two implementations where it is comparing apples to oranges." was not completly true. The java optimizer may generate better/faster code due to the integregral data type. One of the answere/proposed solutions includes this change. I did not intend to polemize with your remarks on optimization and benchmarking in general. – Jakub Jul 23 '14 at 18:38
-1. This is rubbish. Everything you said implies C++ should be faster here. "Comparing apples and oranges" is just weasling. Clocks can compare anything that takes time. The comparison is legitimate. But you seem to acknowledge that you can compare them by comparing them. You just don't discuss their impacts on the program speed. In short this answer would make a good *question*, but answers nothing. – djechlin Jul 24 '14 at 17:56

score 4 · Answer 5 · answered Dec 11 '14 at 00:26

I may be beating a dead horse here, but simply doing a line by line translation of the Java to C++, not even using const reference parameters or any such thing, you can see the C++ is almost twice as fast as Java. All the "syntactic optimization" emplacing etc. has little effect if any...

rep ~/Documents $ g++ -O3 Queen.cpp
rep ~/Documents $ javac Queen.java 
rep ~/Documents $ time java Queen 
(1; 1)
(2; 3)
(3; 5)
(4; 2)
(5; 4)
(6; 10)
(7; 14)
(8; 17)
(9; 20)
(10; 13)
(11; 19)
(12; 22)
(13; 18)
(14; 8)
(15; 21)
(16; 12)
(17; 9)
(18; 6)
(19; 16)
(20; 7)
(21; 11)
(22; 15)

real    0m4.806s
user    0m4.857s
sys     0m0.067s
rep ~/Documents $ time ./a.out
(1; 1)
(2; 3)
(3; 5)
(4; 2)
(5; 4)
(6; 10)
(7; 14)
(8; 17)
(9; 20)
(10; 13)
(11; 19)
(12; 22)
(13; 18)
(14; 8)
(15; 21)
(16; 12)
(17; 9)
(18; 6)
(19; 16)
(20; 7)
(21; 11)
(22; 15)

real    0m2.131s
user    0m2.113s
sys     0m0.000s
rep ~/Documents $

Queen.java (translated to english)

import java.awt.Point;
import java.util.ArrayList;
import java.util.List;

public class Queen {

    private static List<Point> positions = new ArrayList<>();
    private static final int GRID_SIZE = 22;

    public static void main(String[] args) 
    {
        int i = 1;
        placeQueen(i);
        for (Point point : positions) 
        {
            System.out.println("(" + point.x + "; " + point.y + ")");
        }
    }

    private static boolean placeQueen(int i) 
    {
        boolean bIsGoodPos = false;
        for (int j = 1; j <= GRID_SIZE && bIsGoodPos == false; j++) 
        {
            Point emplacement = new Point(i, j);
            positions.add(emplacement);
            if (verifyPos(emplacement) && (i == GRID_SIZE || placeQueen(i + 1))) 
            {
                bIsGoodPos = true;
            }
            else 
            {
                positions.remove(i - 1);
            }
        }

        return bIsGoodPos;
    }

    private static boolean verifyPos(Point position) 
    {
        boolean bIsSafe = true;
        for (Point point : positions) 
        {
            if (!point.equals(position)) 
            {
                if (position.y == point.y) 
                {
                    bIsSafe = false;
                }
                if (Math.abs(position.y - point.y) == Math.abs(position.x - point.x)) 
                {
                    bIsSafe = false;
                }
            }
        }

        return bIsSafe;
    }
}

Queen.cpp

#include <cmath>
#include <vector>
#include <iostream>

using namespace std;

struct Point
{
    int x, y;
    Point(int ii, int jj):x(ii), y(jj){}
};

vector<Point> positions;
int GRID_SIZE = 22;


bool verifyPos(Point position) 
{
    bool bIsSafe = true;
    for(int i = 0; i < positions.size(); ++i) 
    {
        Point point = positions[i];
        if(point.x != position.x || point.y != position.y) 
        {
            if(position.y == point.y) 
            {
                bIsSafe = false;
            }
            if(abs(position.y - point.y) == abs(position.x - point.x)) 
            {
                bIsSafe = false;
            }
        }
    }

    return bIsSafe;
}

bool placeQueen(int i) 
{
    bool bIsGoodPos = false;
    for(int j = 1; j <= GRID_SIZE && bIsGoodPos == false; j++) 
    {
        Point p(i, j);
        positions.push_back(p);
        if(verifyPos(p) && (i == GRID_SIZE || placeQueen(i + 1))) 
        {
            bIsGoodPos = true;
        }
        else 
        {
            positions.pop_back();
        }
    }
    return bIsGoodPos;
}


int main(void) 
{
    int i = 1;
    placeQueen(i);
    for(int i = 0; i < positions.size(); ++i) 
    {
        Point p = positions[i];
        cout << "(" << p.x << "; " << p.y << ")" << endl;
    }

    return 0;
}

snstrand · Answer 6 · 2014-11-27T05:33:52.197

C++ can do it in 21 ms (on a old core i7-860) if you use bit maps. For the timing run I commented out the showSoln() call since a graphic display of the chess board takes twice as long as finding the solution.

#include <iostream>
#include <iomanip>
#include <fstream>
#include <omp.h>                        //omp_get_wtime() is my favorite time function
using namespace std;

static const unsigned n(22);            //size of board
static_assert(n<32,"use long unsigned for bit masks if n > 32");
static const unsigned mask((1U<<n)-1);  //n wide bitmask with every bit set

void showSoln(unsigned* selCol, unsigned numSoln) {     //show a solution
    cout << "\nsolution " << numSoln << '\n';
    for (unsigned row=0; row<n; ++row) {
        for (unsigned col=0; col<n; ++col)
            cout << (col==selCol[row]? " Q": " .");
        cout << '\n';
    }
}

void main() {
    //for each row bitmasks that show what columns are attacked, 1 bit means attacked
    unsigned ulAttack[n];           //cols attacked from upper left, shift right for next row
    unsigned upAttack[n];           //cols attacked from straight up, same for next row
    unsigned urAttack[n];           //cols attacked from upper right, shift left for next row
    unsigned allAttack[n];          //OR of all attacks on given row
    allAttack[0]= ulAttack[0]= upAttack[0]= urAttack[0]= 0; //no attacks on row 0
    unsigned row= 0;                //the row where now placing a queen 
    unsigned selCol[n];             //for each row the selected column
    unsigned numSoln= 0;            //count of soutions found
    double wtime= omp_get_wtime();
    for (;;) {                                          //loop until find 1st (or all) solutions
        if (allAttack[row]!=mask) {                     //if 'row' has a column not attacked
            unsigned long bit;
            _BitScanForward(&bit,~allAttack[row]);      //find lowest column not attacked
            //note - your compiler may have a different intrinsic for find lowest set bit
            selCol[row]= bit;                           //remember selected column for this row
            unsigned move= 1U<<bit;                     //convert selected column to bitmask
            allAttack[row]|= move;                      //mark column attacked to prevent re-use
            if (row==n-1) {                             //if move in last row have a soln
                ++numSoln;
                showSoln(selCol,numSoln);
                break;                                  //remove this break if want all solutions
            } else {                                    //no solution yet, fill in rows below
                unsigned nrow= row+1;                   //next row
                //from attacks on this row plus 'move' decide attacks on row below
                ulAttack[nrow]= (ulAttack[row] | move) >> 1;
                upAttack[nrow]= (upAttack[row] | move);
                urAttack[nrow]= ((urAttack[row] | move) << 1) & mask;
                allAttack[nrow]= ulAttack[nrow] | upAttack[nrow] | urAttack[nrow];
                row= nrow;                              //go to next row
            }
        } else {                //else move on 'row' is impossible so backtrack
            if (!row)           //if backtrack from row 0 have found all solutions
                break;
            --row;              //try next move in prior row
        }
    }
    wtime= omp_get_wtime() - wtime;
    cout << "numSoln= " << numSoln << '\n';
    cout << "time= " << wtime*1000 << " msec\n";
}

Jakub · Answer 7 · 2014-07-23T22:01:38.393

1

Also, there is no reason to use float/doouble types for the coordinates.

You should gain performance if you do not force calling floating point abs library call in your C++

Java stores the Point coordinates as integer. The get functions return double, however this is probably easier to optimize away in Java, then in c++.

edited Jul 23 '14 at 22:01

answered Jul 23 '14 at 16:58

Jakub

583
3
8

3dmaze · Answer 8 · 2018-04-20T22:29:45.687

It seems that for code that requires not too much memory access and intensive computing the difference between Java an C++ is low. But in the opposite situation the difference looks amazing :

test.java

import java.lang.String;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.Scanner;

public class test{
    private static Random gna=new Random();

    public static double new_value(double value){
        if (value<0.5) value=0;
        return value*value;
    }

    public static void main(String[] args) {
        long start_time = System.currentTimeMillis();   
        List<Double> ze_list=new ArrayList();
        for (int i=0;i<1e8;i++){
            double temp=new_value(gna.nextDouble());
            ze_list.add(temp);
        }
        long end_time = System.currentTimeMillis();
        System.out.println("Time (s) :"+ ((end_time-start_time)/1000));
        Scanner input = new Scanner(System.in);
        String inputval = input.next();
    }
}

and compare it to test.cpp:

#include <iostream>
#include <vector>
#include <ctime>
#include <random>
using namespace std;

static default_random_engine dre1(time(0));
static uniform_real_distribution <double> urd1(0, 1);

static double new_value(double value){
    if (value<0.5) value=0;
    return value*value;
}

int main(void){
        time_t tbegin,tend;
        double texec=0;
        tbegin=time(NULL);
        vector<double> ze_list;
        for (int i=0;i<1e8;i++){
            double temp=new_value(urd1(dre1));
            ze_list.push_back(temp);
        }
        tend=time(NULL);
        texec=difftime(tend,tbegin);
        cout << "\nTime (s) " << texec << " s\n";
        int val=0;
        cin >> val;
    return 0;
}

I just tested it on my Mac :

the Java version took 90s and required 3.77 Go
the C++ programm took 2s and required only 770 Mo

Maybe there is a possibility to increase Java performances but I cannot see how.

score -1 · Answer 9 · answered Aug 05 '14 at 17:17

Java passes objects to methods as references and those references are passed by value, but C++ passes objects by value.

You should change C++ code to make it same as Java (Pass pointers in C++ intstead of passing objects):

bool verifierNonPrise(Point* emplacement) // The correct way for passing arguments.