4

Introduction

This is a follow up question to the one I asked previously: Java seems to be executing bare-bones algorithms faster than C++. Why?. Through that post, I learned a few important things:

  1. I was not using Ctrl + F5 to compile and run c++ code on Visual Studios C++ Express, and this was resulting in debuging that slowed down the code execution.
  2. Vectors are as good (if not better) than pointers at handling arrays of data.
  3. My C++ is terrible. ^_^
  4. A better test of execution time would be iteration, rather than recursion.

I tried to write a simpler program, which does not use pointers (or arrays in the Java equivalent), and which is pretty straightforward in its execution. Even then, the Java execution is faster than the C++ execution. What am I doing wrong?

Code:

Java:

 public class PerformanceTest2
 {
      public static void main(String args[])
      {
           //Number of iterations
           double iterations = 1E8;
           double temp;

           //Create the variables for timing
           double start;
           double end;
           double duration; //end - start

           //Run performance test
           System.out.println("Start");
           start = System.nanoTime();
           for(double i = 0;i < iterations;i += 1)
           {
                //Overhead and display
                temp = Math.log10(i);
                if(Math.round(temp) == temp)
                {
                     System.out.println(temp);
                }
           }
           end = System.nanoTime();
           System.out.println("End");

           //Output performance test results
           duration = (end - start) / 1E9;
           System.out.println("Duration: " + duration);
      }
 }

C++:

#include <iostream>
#include <cmath>
#include <windows.h>
using namespace std;

double round(double value)
{
return floor(0.5 + value);
}
void main()
{
//Number of iterations
double iterations = 1E8;
double temp;

//Create the variables for timing
LARGE_INTEGER start; //Starting time
LARGE_INTEGER end; //Ending time
LARGE_INTEGER freq; //Rate of time update
double duration; //end - start
QueryPerformanceFrequency(&freq); //Determinine the frequency of the performance counter (high precision system timer)

//Run performance test
cout << "Start" << endl;
QueryPerformanceCounter(&start);
for(double i = 0;i < iterations;i += 1)
{
    //Overhead and display
    temp = log10(i);
    if(round(temp) == temp)
    {
        cout << temp << endl;
    }
}
QueryPerformanceCounter(&end);
cout << "End" << endl;

//Output performance test results
duration = (double)(end.QuadPart - start.QuadPart) / (double)(freq.QuadPart);
cout << "Duration: " << duration << endl;

//Dramatic pause
system("pause");
}

Observations:

For 1E8 iterations:

C++ Execution = 6.45 s

Java Execution = 4.64 s

Update:

According to Visual Studios, my C++ commandline arguments are:

/Zi /nologo /W3 /WX- /O2 /Ob2 /Oi /Ot /Oy /GL /D "_MBCS" /Gm- /EHsc /GS /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Fp"Release\C++.pch" /Fa"Release\" /Fo"Release\" /Fd"Release\vc100.pdb" /Gd /analyze- /errorReport:queue

Update 2:

I changed the c++ code with the new round function, and I updated the time of execution.

Update 3:

I found the answer to the problem, with thanks to Steve Townsend and Loduwijk. Upon compiling my code into assembly and evaluating it, I found that the C++ assembly was creating way more memory movements than the Java assembly. This is because my JDK was using an x64 compiler, while my Visual Studio Express C++ could not use the x64 architecture, and was thus inherently slower. So, I installed the Windows SDK 7.1, and used those compilers to compile my code (in release, using ctrl + F5). Presently the time ratios are:

C++: ~2.2 s Java: ~4.6 s

Now I can compile all my code in C++, and finally get the speed that I require for my algorithms. :)

Community
  • 1
  • 1
SpeedIsGood
  • 187
  • 1
  • 10
  • 9
    Why wouldn't Java be faster than C++? – Steve Kuo Jun 22 '11 at 19:26
  • 1
    The first thing that comes to mind is that the code itself is broken (you should never compare floating point numbers for equality) and inefficient (you should not use floating point numbers to control loops). You are using 64bit integers for the rounding, what platform are you running the code in? – David Rodríguez - dribeas Jun 22 '11 at 19:29
  • @David Rodriguez: I am running the C++ code in Visual Studios Express C++. I tried this with ints and longs, and I got the same ratio of output. – SpeedIsGood Jun 22 '11 at 19:30
  • 8
    Your C++ `round` function doesn't work the same way as [`Math.round(double)`](http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Math.html#round(double)). – Mat Jun 22 '11 at 19:31
  • @Steve Kuo: I don't know, and that is what I am testing. If it is truly the case that Java is faster than C++ in raw execution speed, then that's fine. But I have been told that C++ should be faster, so I am trying to figure out why it is that Java is executing faster. – SpeedIsGood Jun 22 '11 at 19:34
  • 6
    C++ gives you an awful lot of tools to fine-tune performance. But unless you know how to apply them, C++ is not "magically" faster than any other language. – fredoverflow Jun 22 '11 at 19:36
  • @Mat: That's a good point. I will change it and see if it works. Thanks. – SpeedIsGood Jun 22 '11 at 19:37
  • @SpeedIsGood: btw, If I run your code as-is on my machine (sun jdk1.6, GCC 4.6), I'm getting much less difference than you're seeing (6.8s vs 7.5s). – Mat Jun 22 '11 at 19:41
  • 5
    @SpeedIsGood (wow... I guess the name explains): You can write slow code in any and all languages, and you should be able to write fast code in many languages (some languages are actually *slow*, there is nothing to be done there), but the language affects less than the use you make of it. – David Rodríguez - dribeas Jun 22 '11 at 19:42
  • @Mat: Changing the round to return the same value as the Math.round from Java helped; the execution time got reduced by around a second. Thanks. – SpeedIsGood Jun 22 '11 at 19:43
  • 7
    @SteveKuo Because it simply isn’t, assuming well-written high-level Java and C++ code. – Konrad Rudolph Jun 22 '11 at 19:43
  • @David Rodriguez: I am aware of that. This is really just an experiment I ran out of curiosity. I had heard that C++ is supposed to outperform Java in bare-bones computation, so I just wanted to test this out, and see why it was not the case. Thanks. – SpeedIsGood Jun 22 '11 at 19:45
  • 1
    @SpeedIsGood: Analyzing performance is not a simple problem. Before drawing conclusions you have to really understand what and how you are measuring and what those numbers mean. And for a simple answer: *bare bone* computations (whatever that means, lets say *pure math*) will perform roughly the same, the generated bytecode will be translated to roughly the same assembly in both languages. – David Rodríguez - dribeas Jun 22 '11 at 19:49
  • 1
    It is pointless comparing code from different languages unless you understand both languages absolutely. You seem to have only a basic knowledge of each. Writing bench marks will not improve your skill in either. – Martin York Jun 22 '11 at 20:07
  • 1
    @Martin: To be a little clearer about my general question - I have heard that c++ is supposed to execute faster than Java. Can the above code be written so that it can execute faster than it's Java counterpart? If so, then how, and what optimisations do I need to make to my compiler to do so? If not, then that's that. Thanks. – SpeedIsGood Jun 22 '11 at 20:56
  • 4
    I have heard that a Ferrari is faster than a pickup truck. Now I used one to move a ton of cement two blocks, and it wasn't fast at all. What did I do wrong? – Bo Persson Jun 22 '11 at 21:10
  • @SpeedIsGood: Your assumption that language X is faster than language Y is plain wrong. Each language has its own domain were it excels pick the language that is correct for the domain. C++ is not faster then Java nor is it slower than Java. Its a silly apples to oranges comparison. – Martin York Jun 22 '11 at 21:32
  • log10(0)? Not a good idea. Start your loop at 1. – David Hammen Jun 22 '11 at 21:35
  • @Bo Persson: ROFL, nice one. @Martin: Good! That answers my question - in part, at least. So, suppose I wanted to implement a function that performs a chain of calculus operations on a set of real numbers, and then returns a value as fast as possible... would I use Java or C++? The algorithm in question is really just pure maths. It was suggested to me that C++ would execute faster than Java for this purpose, which is why I tried this experiment. @David Hammen: Ok, will do. Thanks. – SpeedIsGood Jun 22 '11 at 22:25
  • The C++ code with `void main` is not valid. – Cheers and hth. - Alf Jun 22 '11 at 22:33
  • @SpeedlsGood For your calculus operations, the difference between the programs in C++ and Java will likely be negligible. The amount of time you spend on this specific question will probably be far more than the amount of time you might save in program execution, if you even save any reasonable amount of time at all. It makes no sense to spend an hour optimizing something to save 60 microseconds per calculation even if the calculation will run millions of times before you're done using the program; you've still lost more time than you've gained. My time is more valuable than the processor's. – Loduwijk Jun 22 '11 at 22:45
  • While I stand by my comment that you're wasting your time, the answer to your question is technically C++. If you want to waste your time optimizing everything as much as possible, even more than you are probably even capable of, then C++ has to be faster - to startup at least, if not also in actual execution. But at the level you're worrying at, you might as well skip C++ and go to C. Program in C, compile but do not assemble, then optimize the assembly code you are left with, then assemble into machine code. Do that perfect, it is guaranteed to be fastest, and most wasteful of your time. – Loduwijk Jun 22 '11 at 22:51
  • @Loduwijk: Ok, that makes sense. That said, it so happens that the system I am going to be building needs to crunch a huge amount of data in realtime, so I need to get it to run as fast as possible. You mentioned assembly... do you think it would be a good idea to try and learn assembly to handle the mathematical portion of the system? Thanks. – SpeedIsGood Jun 22 '11 at 23:47
  • @SpeedlsGood: No, I do not think it would be worth your time to learn assembly. Your statement that this needs to calculate a large amount of data in realtime does change things slightly, so I can understand your concern, but still no, I don't think learning assembly just for optimizing this would be a good idea or a good use of your time. As for C++ and Java, make sure all optimizations are on their highest values. For Java, run in server mode (never done it so I don't remember the command-line option, but I'm pretty sure it's there). Still, spend some time but don't waste too much. – Loduwijk Jun 23 '11 at 00:03
  • 1
    Also, given that this is to be in realtime and you are bent on speed, you could try approximating some calculations if at all possible and if that is reasonable in your simulation. Also, google "cache miss" and do some research on that topic. That will help more than assembly; make sure everything is always as close to the processor as possible if you can, though that requires a different mindset when you program that way. – Loduwijk Jun 23 '11 at 00:07
  • And one last suggestion: depending on the nature of your simulation (or whatever it is), perhaps you can calculate some things ahead of time (either saved to disk or calculated first thing on program execution before the simulation starts, which will delay startup) and have an array of calculated data that you can then merely iterate over and use in real-time. If this is possible in your situation, this could prove to be be orders of magnitude faster, especially if you can fit all the results in memory. – Loduwijk Jun 23 '11 at 00:10
  • Also, I should point out that my question is really about whether I've left out some form of optimisation or the other in Visual Studios. I mean, this is just a single loop. Would I really have to optimise the assembly language to get a loop to run faster than the exact same loop in Java? – SpeedIsGood Jun 23 '11 at 00:11
  • @Loduwijk: Thanks, that's very helpful! The system I'm building is for crunching a ton of data from the electromagnetic emmisions of a complex multipartite organic-composite ionisation reaction. So the amount of calculus that goes into that will be massive, but it's only number crunching. – SpeedIsGood Jun 23 '11 at 00:18
  • @SpeedlsGood: That depends. You said this specific code snippet is just a test. You have alluded to where you are going to take it from there, and I have started commenting on that. Whether assembly would help at that point depends on the code you end up with and what the compiled code looks like. With what you have here, sure, maybe assembly could shave off a few nanoseconds, but that is insignificant, almost to the point of not being worth mentioning, though I do mention because that's how the conversation is going. Again, assembly is not worth the time to learn for this problem. – Loduwijk Jun 23 '11 at 00:21
  • 1
    I will part with one last comment about this whole thread in general: this seems to be becoming less of a specific Q&A issue and more of a discussion about optimization. While that is a good topic (one which I could probably go on for hours more about), I think we're not supposed to treat StackOverflow that way. It's specifically stated that this is not a discussion site. Still, I hope all this stuff helps. – Loduwijk Jun 23 '11 at 00:24
  • For a "solver" type of app chances are that optimizing the algorithm will get you much more performance than your language choice of Java/C++/Asm. In order for learning assembly to be at all helpful you'll have to get to a point where you know more than the compiler. Unless you have a very tight inner loop that the compiler doesn't optimize well I wouldn't bother trying assembly unless as a last resort. – uesp Jun 23 '11 at 13:33

9 Answers9

21

It's a safe assumption that any time you see Java outperforming C++, especially by such a huge margin, you're doing something wrong. Since this is the second question dedicated to such micro-micro-optimizations, I feel I should suggest finding a less futile hobby.

That answers your question: you are using C++ (really, your operating system) wrong. As to the implied question (how?), it's easy: endl flushes the stream, where Java continues buffering it. Replace your cout line with:

cout << temp << "\n";

You do not understand benchmarking enough to compare this kind of stuff (and by this I mean comparing a single math function). I recommend buying a book on testing and benchmarking.

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • +1 for the *find a less futile hobby* comment... – David Rodríguez - dribeas Jun 22 '11 at 19:37
  • I made the replacement you suggested, and it caused the C++ execution time to increase... :\ – SpeedIsGood Jun 22 '11 at 19:39
  • 3
    How exactly would printing 8 lines of output take 3 seconds, with or without flushing the buffer? – Joe Jun 22 '11 at 19:39
  • 1
    Also, this maybe futile, but it's not a hobby. It's just an experiment. – SpeedIsGood Jun 22 '11 at 19:40
  • 1
    @SpeedIsGood: http://keithlea.com/javabench/data if you want a really good benchmark – Adrian Jun 22 '11 at 19:47
  • 1
    I don't think it is flushing... It is locking of the stream in the multi-threaded environment. Please see my answer below for explanation. As far as I remember C++ standard defines that any character is flush-ed to the stream anyway (I mean ostream here). – ovanes Jun 22 '11 at 19:55
  • 11
    This reminds me of a quote that I've read somewhere on the internet a couple of years ago: "In benchmarks comparing different languages, surprising results often stem from the observer being experienced in one language and having little or no knowledge of the other". It's so true. – Damon Jun 22 '11 at 19:59
  • @vBx: Thanks, this is a very informative link. – SpeedIsGood Jun 22 '11 at 19:59
  • @Damon: I understand, and it is why I would like to learn what it is I need to change. Thanks. – SpeedIsGood Jun 22 '11 at 20:00
  • 1
    How did this get mod up? After much personal insult, the answer given is nonsensical. Java's `println()` also flushes the stream. – irreputable Jun 22 '11 at 22:25
  • @irreputable, nonsensical, yes. http://download.oracle.com/javase/1.3/docs/api/java/io/PrintStream.html `Optionally, a PrintStream can be created so as to flush automatically`. They need downvotes for comments.. – Blindy Jun 22 '11 at 22:30
  • endl is not relevant here, since it causes the stream to be flushed only 7 times. streaming '\n' bring some runtime benefit, but not that much. – ovanes Jun 23 '11 at 13:06
7

You surely don't want to time the output. Remove the output statements inside each loop and rerun, to get a better comparison of what you are actually interested in. Otherwise you are also benchmarking the output functions and your video driver. The resulting speed could actually depend on whether the console window you run in is obscured or minimized at the time of the test.

Make sure you are not running a Debug build in C++. That will be a lot slower than Release, independent of how you start up the process.

EDIT: I've reproduced this test scenario locally and cannot get the same results. With your code modified (below) to remove the output, Java takes 5.40754388 seconds.

public static void main(String args[]) { // Number of iterations 
    double iterations = 1E8;
    double temp; // Create the variables for timing
    double start;
    int matches = 0;
    double end;
    double duration;
    // end - start //Run performance test
    System.out.println("Start");
    start = System.nanoTime();
    for (double i = 0; i < iterations; i += 1) {
        // Overhead and display
        temp = Math.log10(i);
        if (Math.round(temp) == temp) {
            ++matches;
        }
    }
    end = System.nanoTime();
    System.out.println("End");
    // Output performance test results
    duration = (end - start) / 1E9;
    System.out.println("Duration: " + duration);
}

C++ code below takes 5062 ms. This is with JDK 6u21 on Windows, and VC++ 10 Express.

unsigned int count(1E8);
DWORD end;
DWORD start(::GetTickCount());
double next = 0.0;

int matches(0);
for (int i = 0; i < count; ++i)
{
    double temp = log10(double(i));
    if (temp == floor(temp + 0.5))
    {
        ++count;
    }
}

end = ::GetTickCount();
std::cout << end - start << "ms for " << 100000000 << " log10s" << std::endl;

EDIT 2: If I reinstate your logic from Java a little more precisely I get almost identical times for C++ and Java which is what I'd expect given the dependency on log10 implementation.

5157ms for 100000000 log10s

5187ms for 100000000 log10s (double loop counter)

5312ms for 100000000 log10s (double loop counter, round as fn)

Community
  • 1
  • 1
Steve Townsend
  • 53,498
  • 9
  • 91
  • 140
  • The times are almost exactly the same when I remove the output commands. – SpeedIsGood Jun 22 '11 at 19:28
  • I am running it as release in VSE C++, using ctrl + f5 to build without debugging. – SpeedIsGood Jun 22 '11 at 19:31
  • The obvious things seem to be getting ruled out. I am suspicious of the `double` loop control variable, try with `int` and cast to a `double` for input to `log10`. Other than that, perhaps Java's `log10` really is way better than the MCVC++ version. – Steve Townsend Jun 22 '11 at 19:48
  • 3
    @SpeedIsGood: `ctrl + f5 to build without debugging` Ctrl-F5 is **NOT** going to build release mode; You need to change to the 'Release' build configuration (don't know where VCExpress hides it) – sehe Jun 22 '11 at 19:52
  • 1
    @sehe - please see the compiler options in OP's edit, which clearly indicate Release. – Steve Townsend Jun 22 '11 at 19:55
  • @sehe: I did that as well. The problem was that earlier, I only changed it to release mode and not used ctrl + F5. But now I have done both, and it is executing much faster than it was previously (it was previously taking about 15 seconds for the c++ code to run). – SpeedIsGood Jun 22 '11 at 19:56
  • @Steve Townsend: I ran your code and got the same execution time for c++ as I had previously. – SpeedIsGood Jun 22 '11 at 20:43
  • What is your platform? What is your JVM? I am on 32-bit Windows XP, dual core Intel, Sun JDK 6u21. – Steve Townsend Jun 22 '11 at 20:46
  • @Steve Townsend: I using JDK 6u26, with a Windows 7 x64, Intel Core 2 Duo T9400 2.53 GHz processor. – SpeedIsGood Jun 22 '11 at 20:50
  • If you really want to be exhaustive on this I would build your app as native x64 - http://msdn.microsoft.com/en-us/library/x4d2c09s(v=VS.100).aspx – Steve Townsend Jun 22 '11 at 20:58
  • you have `double i` in java and `int i` in c++ – irreputable Jun 22 '11 at 21:47
  • Re "I get almost identical times for C++ and Java which is what I'd expect given the dependency on log10 implementation." Bingo. log10() is an expensive function in any language. Java's log10 is almost certainly not implemented in Java. Sans those function calls, there is very little code here to compare. – David Hammen Jun 22 '11 at 23:36
  • @Steve Townsend: Apparently Visual Studios Express [does not have x64 support](http://msdn.microsoft.com/en-us/library/hs24szh9.aspx)... :\ – SpeedIsGood Jun 23 '11 at 00:29
  • @irreputable - see the 3 times shown last for C++, where I reinstated `double` and `round` as a function – Steve Townsend Jun 23 '11 at 11:01
  • @SpeedIsGood - then I cannot make further suggestions, sorry. – Steve Townsend Jun 23 '11 at 14:10
4

Like @Mat commented, your C++ round isn't the same as Javas Math.round. Oracle's Java documentation says that Math.round is the same thing as (long)Math.floor(a + 0.5d).

Note that not casting to long will be faster in C++ (and possibly in Java as well).

Baffe Boyois
  • 2,120
  • 15
  • 15
  • I changed that, and the c++ code is executing about a second faster now. However, can I get it to run even faster than that? Thanks. – SpeedIsGood Jun 22 '11 at 19:47
2

It's because of the printing of the values. Nothing to do with the actual loop.

tp1
  • 288
  • 1
  • 3
2

Just to sum up what others have stated here: C++ iostream functionality is differently implemented as this in Java. In C++ output to IOStreams creates an inner-type called sentry before outputting each character. E.g. ostream::sentry uses RAII idiom to ensure that the stream is in the consistent state. In the multi-threaded environement (which is in many cases the default one) sentry is also used to lock a mutex object and unlock it, after each character is printed to avoid race conditions. Mutex lock/unlock operations are very expensive and this is the reason why you face that kind of slowdown.

Java goes another direction and locks/unlocks the mutex only once for the whole output string. That's why if you would output to cout from multiple threads you would see really messed up output, but all characters would be there.

You can make C++ IOStreams performant if you work directly with stream buffers and only flush the output occasionly. To test this behaviour just switch off thread-support for your test and your C++ executable should be running much faster.

I played a bit with the stream and the code. Here are my conclusions: First of all there is no single threaded lib starting with VC++ 2008 available. Please the link below, where MS states that single threaded runtime libs are no longer supported: http://msdn.microsoft.com/en-us/library/abx4dbyh.aspx

Note LIBCP.LIB and LIBCPD.LIB (via the old /ML and /MLd options) have been removed. Use LIBCPMT.LIB and LIBCPMTD.LIB instead via the /MT and /MTd options.

MS IOStreams implementation in fact does lock for each output (not per character). Therefore writing:

cout << "test" << '\n';

produces two locks: one for "test" and the second for '\n'. This becomes obvious if you debug in to the operator << implementation:

_Myt& __CLR_OR_THIS_CALL operator<<(double _Val)
    {// insert a double
    ios_base::iostate _State = ios_base::goodbit;
    const sentry _Ok(*this);
    ...
    }

Here the operator call constructs the sentry instance. Which is derived from basic_ostream::_Sentry_base. _Sentry_base ctor makes a lock to the buffer:

template<class _Elem,   class _Traits>
class basic_ostream
  {
  class _Sentry_base
  {
    ///...

  __CLR_OR_THIS_CALL _Sentry_base(_Myt& _Ostr)
        : _Myostr(_Ostr)
        {   // lock the stream buffer, if there
        if (_Myostr.rdbuf() != 0)
          _Myostr.rdbuf()->_Lock();
        }

    ///...
  };
};

Which results in call to:

template<class _Elem, class _Traits>
void basic_streambuf::_Lock()
    {   // set the thread lock
    _Mylock._Lock();
    }

Results in:

void __thiscall _Mutex::_Lock()
    {   // lock mutex
    _Mtxlock((_Rmtx*)_Mtx);
    }

Results in:

void  __CLRCALL_PURE_OR_CDECL _Mtxlock(_Rmtx *_Mtx)
    {   /* lock mutex */
  // some additional stuff which is not called...
    EnterCriticalSection(_Mtx);
    }

Executing your code with std::endl manipulator give the following timings on my machine:

Multithreaded DLL/Release build:

Start
-1.#INF
0
1
2
3
4
5
6
7
End
Duration: 4.43151
Press any key to continue . . .

With '\n' instead of std::endl:

Multithreaded DLL/Release with '\n' instead of endl

Start
-1.#INF
0
1
2
3
4
5
6
7
End
Duration: 4.13076
Press any key to continue . . .

Replacing cout << temp << '\n'; with direct stream buffer serialization to avoid locks:

inline bool output_double(double const& val)
{
  typedef num_put<char> facet;
  facet const& nput_facet = use_facet<facet>(cout.getloc());

  if(!nput_facet.put(facet::iter_type(cout.rdbuf()), cout, cout.fill(), val).failed())
    return cout.rdbuf()->sputc('\n')!='\n';
  return false;
}

Improves the timing a bit again:

Multithreaded DLL/Release without locks by directly writing to streambuf

Start
-1.#INF
0
1
2
3
4
5
6
7
End
Duration: 4.00943
Press any key to continue . . .

Finally changing the type of the iteration variable from double to size_t and making each time a new double value improves the runtime as well:

size_t iterations = 100000000; //=1E8
...
//Run performance test
size_t i;
cout << "Start" << endl;
QueryPerformanceCounter(&start);
for(i=0; i<iterations; ++i)
{
    //Overhead and display
    temp = log10(double(i));
    if(round(temp) == temp)
      output_double(temp);
}
QueryPerformanceCounter(&end);
cout << "End" << endl;
...

Output:

Start
-1.#INF
0
1
2
3
4
5
6
7
End
Duration: 3.69653
Press any key to continue . . .

Now try my suggestion with the suggestions made by Steve Townsend. How are the timings now?

ovanes
  • 5,483
  • 2
  • 34
  • 60
  • How do I switch off thread support? Thanks. – SpeedIsGood Jun 22 '11 at 19:57
  • 1
    You can use /ML command line option or go to the Project Properties/C/C++/Code Generation/Runtime Library and choose the single threaded Runtime Library variant. In my Visual C++ 2011 Express I no longer have option visible. Setting the compiler in Properties/C/C++/Command Line worked well. But you must ensure that all your projects link with the same version or runtim lib. – ovanes Jun 22 '11 at 20:10
  • I tried to do that, but the compiler is saying: `cl : Command line warning D9002: ignoring unknown option '/ML'`. – SpeedIsGood Jun 22 '11 at 20:28
  • Do you use a command line option or the VC Expresse Dialog? I was able to set this option without any warnings. This is the link to MSDN page: http://msdn.microsoft.com/en-us/library/2kzt1wy3(v=vs.71).aspx – ovanes Jun 22 '11 at 22:42
  • I am using the command line option. The VCEC++ Runtime Library dialog gives me four options: `/MT`, `/MTd`, `/MD`, and `/MDd`, so I have to set it via the command line option. – SpeedIsGood Jun 22 '11 at 23:55
  • 1
    It seems like MS no longer supports single threaded runtime libs. Because the link I sent you also applies to to VC++ 7.1 and I can't find the same options for VC++ 2011. It is pretty late now. I will be able to test your code with previous compiler versions tomorrow and report my findings here. I am pretty sure, it depends on the stream locking issues. – ovanes Jun 23 '11 at 00:12
  • @Ovanes: Your analysis is really good! Thanks! Unfortunately, the `/MT` flag is not causing any change in speed. And for some ungodly reason, using `"\n"` instead of `endl` is causing a decrease in speed... :\ – SpeedIsGood Jun 23 '11 at 22:13
  • As I wrote endl or '\n' are unlikely to cause the major slowdow, since you do much more calculations than outputs to iostreams. They still bring some falsified result into your measurements, but are not the show stopper. I found this paper for another log-implementation proposal. May be it is worth trying that approach and compare the timing than: http://www.icsi.berkeley.edu/pubs/techreports/TR-07-002.pdf – ovanes Jun 24 '11 at 16:07
2

Perhaps you should use the fast floating point mode of MSVC

The fp:fast Mode for Floating-Point Semantics

When the fp:fast mode is enabled, the compiler relaxes the rules that fp:precise uses when optimizing floating-point operations. This mode allows the compiler to further optimize floating-point code for speed at the expense of floating-point accuracy and correctness. Programs that do not rely on highly accurate floating-point computations may experience a significant speed improvement by enabling the fp:fast mode.

The fp:fast floating-point mode is enabled using a command-line compiler switch as follows:

  • cl -fp:fast source.cpp or
  • cl /fp:fast source.cpp

On my Linux box (64 bit) the timings are about equal:

oracle openjdk 6

sehe@natty:/tmp$ time java PerformanceTest2 

real    0m5.246s
user    0m5.250s
sys 0m0.000s

gcc 4.6

sehe@natty:/tmp$ time ./t

real    0m5.656s
user    0m5.650s
sys 0m0.000s

Full disclosure, I drew all the optimization flags in the book, see Makefile below


Makefile
all: PerformanceTest2 t

PerformanceTest2: PerformanceTest2.java
    javac $<

t: t.cpp
    g++ -g -O2 -ffast-math -march=native $< -o $@

t.cpp
#include <stdio.h>
#include <cmath>

inline double round(double value)
{
    return floor(0.5 + value);
}
int main()
{
    //Number of iterations
    double iterations = 1E8;
    double temp;

    //Run performance test
    for(double i = 0; i < iterations; i += 1)
    {
        //Overhead and display
        temp = log10(i);
        if(round(temp) == temp)
        {
            printf("%F\n", temp);
        }
    }
    return 0;
}

PerformanceTest2.java
public class PerformanceTest2
{
    public static void main(String args[])
    {
        //Number of iterations
        double iterations = 1E8;
        double temp;

        //Run performance test
        for(double i = 0; i < iterations; i += 1)
        {
            //Overhead and display
            temp = Math.log10(i);
            if(Math.round(temp) == temp)
            {
                System.out.println(temp);
            }
        }
    }
}
Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Isn't `-O3` the best optimization for GCC? – Xeo Jun 22 '11 at 21:28
  • @Xeo: as always, I profiled; I added `-O4 -funsafe-math-optimizations -funsafe-loop-optimizations -freciprocal-math` and removed them since there wasn't any gain; Both fastmath and native arch together seem to provide speed up (from roughly 7.2s to 5.7s) – sehe Jun 22 '11 at 21:31
  • I hate that I can't quickly do a gcj compiled test of the java version. I'd love to look at the actual code generated, but somehow my SSD is freezing halfway the installation. Shit happens :) – sehe Jun 22 '11 at 21:32
  • Your `round` is wrong; it fails for negative numbers. – Billy ONeal Jun 22 '11 at 21:43
  • @Billy: It is copied from the question. It is _not_ my round. (Also that version was the result of prior comments with another answer, so I'm assuming that it was chosen for a valid reason) – sehe Jun 22 '11 at 21:45
  • @self: Re: GCJ - I fixed my system by (_live_!) migrating my root fs to another SSD (long live [lvm2!](http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux))); alas: it is slower when compiled to native code (7.9s) – sehe Jun 22 '11 at 22:12
  • 1
    @sehe: The `/fp:fast` option reduced the time of the algorithm considerably. However, what is the downside to `/fp:fast`? This particular code does not use double precision, but suppose I wanted to use very precise values, would `/fp:fast` cause problems with the code? Thanks! – SpeedIsGood Jun 23 '11 at 00:02
  • @SpeedIsGood: I don't know but (a) the linked article should be the definitive reference, being from the MS compiler team (b) and the flag comes up regularly on StackOverflow (search for strict floating point) – sehe Jun 23 '11 at 06:26
1

Might want to take a look here

There can be a whole host of factors that could explain why your Java code is running faster than the C++ code. One of those factors could simply be that for this test case, the Java code is faster. I wouldn't even consider using that as a blanket statement for one language being faster than the other though.

If I were to make one change in the way you're doing things, I'd port the code over to linux and time runtime with the time command. Congrats, you just eliminated the whole windows.h file.

LainIwakura
  • 2,871
  • 2
  • 19
  • 22
  • Thanks for the link. How do you do high-precision timing with C++ in Linux? I will try what you said. Thanks. – SpeedIsGood Jun 22 '11 at 19:48
  • As far as I know, the `time` command should be precise enough for simple tests such as these. I use it to examine complexities with some algorithms I write in C++. – LainIwakura Jun 22 '11 at 19:55
  • when I use the time command (using `#include `), it seems to return values in seconds. Can I get it more precise than that? Thanks. – SpeedIsGood Jun 22 '11 at 20:05
  • 2
    I meant the `time` command from the shell. `time *name of program*`. For me it gives time in seconds with milliseconds. Beyond this, I don't know how to get more precise. – LainIwakura Jun 22 '11 at 20:07
  • @SpeedIsGood: It's what I used [here](http://stackoverflow.com/questions/6445321/why-does-java-seem-to-be-executing-faster-than-c-part-2/6446620#6446620) – sehe Jun 22 '11 at 21:34
1

Your C++ program is slow because you don't know your tool (Visual Studio) well enough. Look at the line of icons below the menu. You will find the word "Debug" in the project configuration textbox. Switch to "Release". Make sure you rebuilt the project completely by menu Build|Clean project and Build|Build All of Ctrl+Alt+F7. (The names on your menu may be slightly different, since my program is in German). It's not about starting by F5 or Ctrl+F5.

In "Release mode" your C++ program is about twice as fast as your Java program.

The perception that C++ programs are slower than Java or C# programs comes from building them in Debug mode (the default). Also Cay Horstman, an appreciated C++ and Java book author, fell to this trap in "Core Java 2", Addison Wesley (2002).

The lesson is: know your tools, especially, when you try to judge them.

René Richter
  • 3,839
  • 3
  • 34
  • 42
  • 1
    I already have it in release mode. I have cleaned and rebuilt it several times over. It still runs at the same execution speed. – SpeedIsGood Jun 22 '11 at 20:44
0

JVM can do runtime optimizations. For this simple example, I guess the only relevant optimization is method inlining of Math.round(). Some method invocation overhead is saved; and further optimization is possible after inlining flats the code.

Watch this presentation to fully appreciate how powerful JVM inlining can be

http://www.infoq.com/presentations/Towards-a-Universal-VM

This is nice. It means we can structure our logic with methods, and they don't cost anything at runtime. When they argued about GOTO vs. procedures in 70s, they probably didn't see this coming.

irreputable
  • 44,725
  • 9
  • 65
  • 93