28

I'm currently deciding on a platform to build a scientific computational product on, and am deciding on either C#, Java, or plain C with Intel compiler on Core2 Quad CPU's. It's mostly integer arithmetic.

My benchmarks so far show Java and C are about on par with each other, and .NET/C# trails by about 5%- however a number of my coworkers are claiming that .NET with the right optimizations will beat both of these given enough time for the JIT to do its work.

I always assume that the JIT would have done it's job within a few minutes of the app starting (Probably a few seconds in my case, as it's mostly tight loops), so I'm not sure whether to believe them

Can anyone shed any light on the situation? Would .NET beat Java? (Or am I best just sticking with C at this point?).

The code is highly multithreaded and data sets are several terabytes in size.

Haskell/Erlang etc are not options in this case as there is a significant quantity of existing legacy C code that will be ported to the new system, and porting C to Java/C# is a lot simpler than to Haskell or Erlang. (Unless of course these provide a significant speedup).

Edit: We are considering moving to C# or Java because they may, in theory, be faster. Every percent we can shave off our processing time saves us tens of thousands of dollars per year. At this point we are just trying to evaluate whether C, Java, or c# would be faster.

abatishchev
  • 98,240
  • 88
  • 296
  • 433
Rexsung
  • 545
  • 2
  • 5
  • 8
  • 2
    What's wrong with C++? Porting from C to C++ is probably much easier than going to Java or C#. – GManNickG Apr 08 '09 at 05:47
  • 1
    With reference to the specific "enough time for the JIT to do it's work" the current MS jit does *not* rejit methods. Some JVM's do. Once you are passed the initial jit overhead (minuscule if your code is pumping terabytes around) current state of the art jvms might get better by the CLR won't – ShuggyCoUk Apr 08 '09 at 12:30
  • Might I reiterate a point in my answer (there are quite a few) you need to give specifics about which jvm/.Net runtimes/architecture/c compiler and optimizations/Debugvs Release builds etc) you are using. Whether you are doing checked or unchecked arithmetic etc. etc... – ShuggyCoUk Apr 08 '09 at 12:32
  • Also an indication of what part seems to be taking all the time (with real code) – ShuggyCoUk Apr 08 '09 at 12:33
  • Have you tested with Mono JIT? – Sharique Apr 09 '09 at 11:34
  • Bit of a shame Rexsung seems to have run and gunned since this has provoked at lot of interesting discussion... – ShuggyCoUk Apr 10 '09 at 18:57
  • 2
    Isn't it community WIKI ??? are you guys sleeping?? – RubyDubee Mar 21 '10 at 12:24
  • How much would you get simply by upgrading to the newest multicore processors? – Thorbjørn Ravn Andersen Apr 17 '10 at 15:15
  • "Java and C are about on par with each other" Are you sure? – ascanio May 13 '11 at 15:12

26 Answers26

67

The key piece of information in the question is this:

Every percent we can shave off our processing time saves us tens of thousands of dollars per year

So you need to consider how much it will cost to shave each percent off. If that optimization effort costs tens of thousands of dollars per year, then it isn't worth doing. You could make a bigger saving by firing a programmer.

With the right skills (which today are rarer and therefore more expensive) you can hand-craft assembler to get the fastest possible code. With slightly less rare (and expensive) skills, you can do almost as well with some really ugly-looking C code. And so on. The more performance you squeeze out of it, the more it will cost you in development effort, and there will be diminishing returns for ever greater effort. If the profit from this stays at "tens of thousands of dollars per year" then there will come a point where it is no longer worth the effort. In fact I would hazard a guess you're already at that point because "tens of thousands of dollars per year" is in the range of one salary, and probably not enough to buy the skills required to hand-optimize a complex program.

I would guess that if you have code already written in C, the effort of rewriting it all as a direct translation in another language will be 90% wasted effort. It will very likely perform slower simply because you won't be taking advantage of the capabilities of the platform, but instead working against them, e.g. trying to use Java as if it was C.

Also within your existing code, there will be parts that make a crucial contribution to the running time (they run frequently), and other parts that are totally irrelevant (they run rarely). So if you have some idea for speeding up the program, there is no economic sense in wasting time applying it to the parts of the program that don't affect the running time.

So use a profiler to find the hot spots, and see where time is being wasted in the existing code.

Update when I noticed the reference to the code being "multithreaded"

In that case, if you focus your effort on removing bottlenecks so that your program can scale well over a large number of cores, then it will automatically get faster every year at a rate that will dwarf any other optimization you can make. This time next year, quad cores will be standard on desktops. The year after that, 8 cores will be getting cheaper (I bought one over a year ago for a few thousand dollars), and I would predict that a 32 core machine will cost less than a developer by that time.

Daniel Earwicker
  • 114,894
  • 38
  • 205
  • 284
  • 2
    I agree with less words. Furthermore, I think you can better save lots of money either by utilizing the gpu's if possible and getting help from intel themselves if it's a cool/big project – Mafti Apr 08 '09 at 06:46
  • 2
    It's not clear that there's going to be enough bandwidth on 32-core machines to do data-intensive computing on all the cores. It might also be worth it to look at distributed-memory scaling, like MapReduce or MPI. This could get him scaling up to thousands of cores *now*. See below. – Todd Gamblin Apr 08 '09 at 17:35
  • 5
    +1 for the focus on the economics of the issue – M. Anthony Aiello Jul 19 '11 at 19:47
31

I'm sorry, but that is not a simple question. It would depend a lot on what exactly was going on. C# is certainly no slouch, and you'd be hard-pressed to say "java is faster" or "C# is faster". C is a very different beast... it maybe has the potential to be faster - if you get it right; but in most cases it'll be about the same, but much harder to write.

It also depends how you do it - locking strategies, how you do the parallelization, the main code body, etc.

Re JIT - you could use NGEN to flatten this, but yes; if you are hitting the same code it should be JITted very early on.

One very useful feature of C#/Java (over C) is that they have the potential to make better use of the local CPU (optimizations etc), without you having to worry about it.

Also - with .NET, consider things like "Parallel Extensions" (to be bundled in 4.0), which gives you a much stronger threading story (compared to .NET without PFX).

Josh Crozier
  • 233,099
  • 56
  • 391
  • 304
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • +1 for mentioning Parallel Extensions. As has already been pointed out, the number of cores available will increase over time. Parallel Extensions should allow any existing code to make the best use of those cores as they are added without any effort from the developer at all. – Grant Wagner Apr 08 '09 at 17:02
  • One very useful feature of C#/Java (over C) is that they have the potential -- Please explain this, how can C#/Java VM/JT make better (i.e. faster) optimizations than a optimizing C compiler that targets the native CPU? – mctylr Apr 08 '09 at 23:36
  • 1
    @mctylr they do not need to deal with aliasing, they have access to runtime behaviour when optimizing, they have more freedom to screw around with the internals (like Escape Analysis) because unlike c the internals are hidden away. This is very much still potential but it's getting there fast – ShuggyCoUk Apr 09 '09 at 07:57
14

Don't worry about language; parallelize!

If you have a highly multithreaded, data-intensive scientific code, then I don't think worrying about language is the biggest issue for you. I think you should concentrate on making your application parallel, especially making it scale past a single node. This will get you far more performance than just switching languages.

As long as you're confined to a single node, you're going to be starved for compute power and bandwidth for your app. On upcoming many-core machines, it's not clear that you'll have the bandwidth you need to do data-intensive computing on all the cores. You can do computationally intensive work (like a GPU does), but you may not be able to feed all the cores if you need to stream a lot of data to every one of them.

I think you should consider two options:

  1. MapReduce
    Your problem sounds like a good match for something like Hadoop, which is designed for very data-intensive jobs.

    Hadoop has scaled to 10,000 nodes on Linux, and you can shunt your work off either to someone else's (e.g. Amazon's, Microsoft's) or your own compute cloud. It's written in Java, so as far as porting goes, you can either call your existing C code from within Java, or you can port the whole thing to Java.

  2. MPI
    If you don't want to bother porting to MapReduce, or if for some reason your parallel paradigm doesn't fit the MapReduce model, you could consider adapting your app to use MPI. This would also allow you to scale out to (potentially thousands) of cores. MPI is the de-facto standard for computationally intensive, distributed-memory applications, and I believe there are Java bindings, but mostly people use MPI with C, C++, and Fortran. So you could keep your code in C and focus on parallelizing the performance-intensive parts. Take a look at OpenMPI for starters if you are interested.

Todd Gamblin
  • 58,354
  • 15
  • 89
  • 96
  • 1
    +1 although never underestimate the amount of time it takes to distribute a complex program, it might turn out to be not worth the effort/cash! – Ed James Apr 16 '09 at 18:22
11

I'm honestly surprised at those benchmarks.

In a computationally intensive product I would place a large wager on C to perform faster. You might write code that leaks memory like a sieve, and has interesting threading related defects, but it should be faster.

The only reason I could think that Java or C# would be faster is due to a short run length on the test. If little or no GC happened, you'll avoid the overhead of actually deallocating memory. If the process is iterative or parallel, try sticking a GC.Collect wherever you think you're done a bunch of objects(after setting things to null or otherwise removing references).

Also, if you're dealing with terabytes of data, my opinion is you're going to be much better off with deterministic memory allocation that you get with C. If you deallocate roughly close to when you allocate your heap will stay largely unfragmented. With a GC environment you may very well end up with your program using far more memory after a decent run length than you would guess, just because of fragmentation.

To me this sounds like the sort of project where C would be the appropriate language, but would require a bit of extra attention to memory allocation/deallocation. My bet is that C# or Java will fail if run on a full data set.

Darren Clark
  • 2,983
  • 20
  • 15
  • 5
    Your preconceptions are out of date. Modern garbage collectors are very good at optimizing deallocation, especially when it happens close to allocation. They can even outperform C-style malloc/free. – Michael Borgwardt Apr 08 '09 at 07:00
  • 2
    Could be, what I do know is if I allocate a variable sized buffer(around 8K) in a loop with C# and don't do a GC.Collect I die. Fast. Even when releasing the buffer each iteration. LOH ftl. – Darren Clark Apr 08 '09 at 07:22
  • 8K isn't enough to get into the LOH as far as I'm aware. Please post a short but complete program to demonstrate the problem. – Jon Skeet Apr 08 '09 at 09:24
  • 1
    Hrm... You're both right, and I'm wrong here. I have an old program that had problems, but it was allocating more than 8K, and I was misremembering the LOH object size. It also was on 1.0, and I can't repro the problem on 3.5. So Michael's right as well. I'm still surprised at the benchmarks though. – Darren Clark Apr 08 '09 at 17:26
  • 1.0 LOH was badly broken in many ways, we had similar problems. 1.1 mostly fixed it, 2.0 all problems went away – ShuggyCoUk Apr 09 '09 at 07:58
  • @JonSkeet isnt't a double[] array of length 1k going to live on the LOH already? But in general the (unpublished) limit would be 80k on 32bit, I guess. Dont't know for 64bit though – user492238 Feb 03 '12 at 10:08
  • @user492238: why would it? It would only be 8K in size... I wouldn't *expect* that to be on the LOH. – Jon Skeet Feb 03 '12 at 10:14
  • @JonSkeet I dont know exactly. Possibly bcause 8k are already rel. big for the usual generational (compacting) heap and some aligning considerations? One can determine the actual limit by: for (int i = 0; i < 10000; i += 100) Console.Out.WriteLine(i.ToString() + ":" + GC.GetGeneration(new double[i])); There is no fixed spec AFAIK about the limit, but I think double[] and interned strings are the only exception from the 80k rule. – user492238 Feb 04 '12 at 14:03
  • @user492238: On my box that's showing gen 0 for that whole loop... it goes to gen 2 at 10700. – Jon Skeet Feb 04 '12 at 14:18
  • @JonSkeet On my box it switches to '2' at 1000 lenght. 32Bit, CLR 4.0 – user492238 Feb 06 '12 at 17:31
  • @user492238: Interesting. Mine was CLR 4 as well, but 64-bit. – Jon Skeet Feb 06 '12 at 17:52
  • @JonSkeet yes, several limits appear to be different regarding the heap on 32 vers. 64 bit. This is at least also true for initial heap segment sizes. – user492238 Feb 06 '12 at 18:05
10

Quite some time ago Raymond Chen and Rico Mariani had a series of blog posts incrementally optimising a file load into a dictionary tool. While .NET was quicker early on (i.e. easy to make quick) the C/Win32 approach eventually was significantly faster -- but at considerable complexity (e.g. using custom allocators).

In the end the answer to which is faster will heavily depend on how much time you are willing to expend on eking every microsecond out of each approach. That effort (assuming you do it properly, guided by real profiler data) will make a far greater difference than choice of language/platform.


The first and last performance blog entries:

(The last link gives an overall summary of the results and some analysis.)

Community
  • 1
  • 1
Richard
  • 106,783
  • 21
  • 203
  • 265
5

It is going to depend very much on what you are doing specifically. I have Java code that beats C code. I have Java code that is much slower than C++ code (I don't do C#/.NET so cannot speak to those).

So, it depends on what you are doing, I am sure you can find something that is faster in language X than language Y.

Have you tried running the C# code through a profiler to see where it is taking the most time (same with Java and C while you are at it). Perhaps you need to do something different.

The Java HotSpot VM is more mature (roots of it going back to at least 1994) than the .NET one, so it may come down to the code generation abilities of both for that.

Daniel Rikowski
  • 71,375
  • 57
  • 251
  • 329
TofuBeer
  • 60,850
  • 18
  • 118
  • 163
  • Also Java has better support on diffrent OS (linux). – trunkc Apr 08 '09 at 06:08
  • If your "different OS" is linux, then not really - mono is very well supported there. But certainly, java has better penetration into the more... "obscure" devices. For server work (typically windows or linux/unix), there isn't much difference between C# and Java (in this respect). – Marc Gravell Apr 08 '09 at 06:52
  • http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all – trunkc Apr 08 '09 at 09:11
5

You say "the code is multithreaded" which implies that the algorithms are parallelisable. Also, you save the "data sets are several terabytes in size".

Optimising is all about finding and eliminating bottlenecks.

The obvious bottleneck is the bandwidth to the data sets. Given the size of the data, I'm guessing that the data is held on a server rather than on a desktop machine. You haven't given any details of the algorithms you're using. Is the time taken by the algorithm greater than the time taken to read/write the data/results? Does the algorithm work on subsets of the total data?

I'm going to assume that the algorithm works on chunks of data rather than the whole dataset.

You have two scenarios to consider:

  1. The algorithm takes more time to process the data than it does to get the data. In this case, you need to optimise the algorithm.

  2. The algorithm takes less time to process the data than it does to get the data. In this case, you need to increase the bandwidth between the algorithm and the data.

In the first case, you need a developer that can write good assembler code to get the most out of the processors you're using, leveraging SIMD, GPUs and multicores if they're available. Whatever you do, don't just crank up the number of threads because as soon as the number of threads exceeds the number of cores, your code goes slower! This due to the added overhead of switching thread contexts. Another option is to use a SETI like distributed processing system (how many PCs in your organisation are used for admin purposes - think of all that spare processing power!). C#/Java, as bh213 mentioned, can be an order of magnitude slower than well written C/C++ using SIMD, etc. But that is a niche skillset these days.

In the latter case, where you're limited by bandwidth, then you need to improve the network connecting the data to the processor. Here, make sure you're using the latest ethernet equipment - 1Gbps everywhere (PC cards, switches, routers, etc). Don't use wireless as that's slower. If there's lots of other traffic, consider a dedicated network in parallel with the 'office' network. Consider storing the data closer to the clients - for every five or so clients use a dedicated server connected directly to each client which mirrors the data from the server.

If saving a few percent of processing time saves "tens of thousands of dollars" then seriously consider getting a consultant in, two actually - one software, one network. They should easily pay for themselves in the savings made. I'm sure there's many here that are suitably qualified to help.

But if reducing cost is the ultimate goal, then consider Google's approach - write code that keeps the CPU ticking over below 100%. This saves energy directly and indirectly through reduced cooling, thus costing less. You'll want more bang for your buck so it's C/C++ again - Java/C# have more overhead, overhead = more CPU work = more energy/heat = more cost.

So, in summary, when it comes to saving money there's a lot more to it than what language you're going to choose.

Skizz
  • 69,698
  • 10
  • 71
  • 108
4

If there is already a significant quantity of legacy C code that will be added to the system then why move to C# and Java?

In response to your latest edit about wanting to take advantage of any improvements in processing speed....then your best bet would be to stick to C as it runs closer to the hardware than C# and Java which have the overhead of a runtime environment to deal with. The closer to the hardware you can get the faster you should be able to run. Higher Level languages such as C# and Java will result in quicker development times...but C...or better yet Assembly will result in quicker processing time...but longer development time.

mezoid
  • 28,090
  • 37
  • 107
  • 148
4

I participated in a few TopCoder's Marathon matches where performance was they key to victory.

My choice was C#. I think C# solutions placed slightly above Java and were slighly slower than C++... Until somebody wrote a code in C++ that was a order of magnitude faster. You were alowed to use Intel compiler and the winning code was full of SIMD insturctions and you cannot replicate that in C# or Java. But if SIMD is not an option, C# and Java should be good enough as long as you take care to use memory correctly (e.g. watch for cache misses and try to limit memory access to the size of L2 cache)

bh213
  • 6,343
  • 9
  • 43
  • 52
  • Re SIMD: .NET has SIMD support in the Mono implementation via the Mono.Simd namespace. – Konrad Rudolph Apr 08 '09 at 10:38
  • And Microsoft has indicated that they would very much like to also expose SIMD functionality. Mono in turn has indicated that they are willing to change their APIs to whatever MS comes up with (although the decent thing for MS to do, would be to grudgingly acknowledge that they missed the boat ... – Jörg W Mittag Apr 08 '09 at 17:49
  • ... and just adopt the Mono API). Anyway, none of this is going to happen before .NET 5.0 or even later. – Jörg W Mittag Apr 08 '09 at 17:50
4

You question is poorly phrased (or at least the title is) because it implies this difference is endemic and holds true for all instances of java/c#/c code.

Thankfully the body of the question is better phrased because it presents a reasonably detailed explanation of the sort of thing your code is doing. It doesn't state what versions (or providers) of c#/java runtimes you are using. Nor does it state the target architecture or machine the code will run on. These things make big differences.

You have done some benchmarking, this is good. Some suggestions as to why you see the results you do:

  • You aren't as good at writing performant c# code as you are at java/c (this is not a criticism, or even likely but it is a real possibility you should consider)
  • Later versions of the JVM have some serious optimizations to make uncontended locks extremely fast. This may skew things in your favour (And especially the comparison with the c implementation threading primitives you are using)
  • Since the java code seems to run well compared to the c code it is likely that you are not terribly dependent on the heap allocation strategy (profiling would tell you this).
  • Since the c# code runs less well than the java one (and assuming the code is comparable) then several possible reasons exist:
    • You are using (needlessly) virtual functions which the JVM will inline but the CLR will not
    • The latest JVM does Escape Analysis which may make some code paths considerably more efficient (notably those involving string manipulation whose lifetime is stack bound
    • Only the very latest 32 bit CLR will inline methods involving non primitive structs
    • Some JVM JIT compilers use hotspot style mechanisms which attempt to detect the 'hotspots' of the code and spend more effort re-jitting them.

Without an understanding of what your code spends most of its time doing it is impossible to make specific suggestions. I can quite easily write code which performs much better under the CLR due to use of structs over objects or by targeting runtime specific features of the CLR like non boxed generics, this is hardly instructive as a general statement.

ShuggyCoUk
  • 36,004
  • 6
  • 77
  • 101
3

Actually it is 'Assembly language'.

Dhanapal
  • 14,239
  • 35
  • 115
  • 142
  • Why was this marked down? Given the strange question, it's an appropriate answer! – Daniel Earwicker Apr 08 '09 at 06:31
  • +1 for people marking it down; not a bad answer if you have people with ASM skills. – gatoatigrado Apr 08 '09 at 09:30
  • Did he say "I'll do anything to make it faster"? ASM is appropriate for a question like that! – gbarry Apr 08 '09 at 18:20
  • Moore's law will make your hardware go faster, and will do it faster than you can code your app. – gbarry Apr 08 '09 at 18:21
  • 2
    No Moore's Law will make your hardware more *dense* for the same price. It is not and never has been about speed (quantum effects are a bitch) – ShuggyCoUk Apr 09 '09 at 08:00
  • Assembly language which is written to be usable on a variety of machines may be inferior to the code produced by a JITter which is tuned for the particular hardware it's running on. Often times, tweaks to a program which would improve performance on one machine would impair performance on another. A JIT can automatically apply tweaks that will enhance performance, while refraining from applying tweaks that would degrade it. – supercat Feb 05 '12 at 01:37
3

Depends on what kind of application you are writing. Try The Computer Language Benchmarks Game

http://shootout.alioth.debian.org/u32q/benchmark.php?test=all&lang=csharp&lang2=java&box=1 http://shootout.alioth.debian.org/u64/benchmark.php?test=all&lang=csharp&lang2=java&box=1

J-16 SDiZ
  • 26,473
  • 4
  • 65
  • 84
  • 1
    Please pay extra attention to the results of C# because they are run on mono .net. I am not say mono is slower than MS .net, but there MAY be a difference in speed. – Canton Apr 08 '09 at 06:09
3

To reiterate a comment, you should be using the GPU, not the CPU if you are doing arithmetic scientific computing. Matlab with CUDA plugins would be much more awesome than Java or c# if Matlab licensing is not an issue. The nVidia documentation shows how to compile any CUDA function into a mex file. If you need free software, I like pycuda.

If however, GPUs are not an option, I personally like C for a lot of routines because the optimizations the compiler makes are not as complicated as JIT: you don't have to worry about whether a "class" becomes like a "struct" or not. In my experience, problems can usually be broken down such that higher-level things can be written in a very expressive language like Python (rich primitives, dynamic types, incredibly flexible reflection), and transformations can be written in something like C. Additionally, there's neat compiler software, like PLUTO (automatic loop parallelization and OpenMP code generation), and libraries like Hoard, tcmalloc, BLAS (CUBLAS for gpu), etc. if you choose to go the C/C++ route.

gatoatigrado
  • 16,580
  • 18
  • 81
  • 143
3

One thing to notice is that IF your application(s) would benefit of lazy evaluation a functional programming language like Haskell may yield speedups of a totally different magnitude than the theretically optimal structured/OO code just by not evaluating unnecessary branches.

Also, if you are talking about the monetary benefit of better performance, don't forget to add the cost of maintaing your software into the equation.

ymihere
  • 381
  • 1
  • 4
  • 11
  • With F#/Scala you can target CLR/JVM and interoperate with C#/Java easily, allowing functional code to be used where it gives the most benefit. – Richard Apr 08 '09 at 10:29
  • Neither F# nor Scala are lazily evaluated, and at least in my experience, they run a tad slower than C# and Java. – Juliet Apr 08 '09 at 13:20
  • 1
    Lazy evaluation is so big a mindstep that it needs to be tried before you can tell how different it is :) – Thorbjørn Ravn Andersen Apr 17 '10 at 15:18
2

Surely the answer is to go and buy the latest PC with the most cores/processors you can afford. If you buy one of the latest 2x4 core PCs you will find not only does it have twice as many cores as a quad core but also they run 25-40% faster than the previous generation of processors/machines.

This will give you approximately a 150% speed up. Far more than choosing Java/C# or C. and whats more your get the same again every 18 months if you keep buying in new boxes!

You can sit there for months rewriting you code or I could go down to my local PC store this afternoon and be running faster than all your efforts same day.

Improving code quality/efficiency is good but sometimes implementation dollars are better spent elsewhere.

AnthonyLambert
  • 8,768
  • 4
  • 37
  • 72
2

Writing in one language or another will only give you small speed ups for a large amount of work. To really speed things up you might want to look at the following:

  1. Buying the latest fastest Hardware.
  2. Moving from 32 bit operating system to 64 bit.
  3. Grid computing.
  4. CUDA / OpenCL.
  5. Using compiler optimisation like vectorization.
AnthonyLambert
  • 8,768
  • 4
  • 37
  • 72
1

Depends what you benchmark and on what hardware. I assume it's speed rather than memory or CPU usage.But....

If you have a dedicated machine for an app only with very large amounts of memory then java might be 5% faster.

If you go down in the real world with limited memory and more apps running on the same machine .net looks better at utilizing computing resources :see here

If the hardware is very constrained, C/C++ wins hands down.

dmihailescu
  • 1,625
  • 17
  • 15
1

I would go with C# (or Java) because your development time will probably be much faster than with C. If you end up needing extra speed then you can always rewrite a section in C and call it as a module.

Nathan
  • 11,938
  • 12
  • 55
  • 62
1

My preference would be C or C++ because I'm not separated from the machine language by a JIT compiler.

You want to do intense performance tuning, and that means stepping through the hot spots one instruction at a time to see what it is doing, and then tweaking the source code so as to generate optimal assembler.

If you can't get the compiler to generate what you consider good enough assembler code, then by all means write your own assembler for the hot spot(s). You're describing a situation where the need for performance is paramount.

What I would NOT do if I were in your shoes (or ever) is rely on anecdotal generalizations about one language being faster or slower than another. What I WOULD do is multiple passes of intense performance tuning along the lines of THIS and THIS and THIS. I have done this sort of thing numerous times, and the key is to iterate the cycle of diagnosis-and-repair because every slug fixed makes the remaining ones more evident, until you literally can't squeeze another cycle out of that turnip.

Good luck.

Added: Is it the case that there is some seldom-changing configuration information that determines how the bulk of the data is processed? If so, it may be that the program is spending a lot of its time re-interpreting the configuration info to figure out what to do next. If so, it is usually a big win to write a code generator that will read the configuration info and generate an ad-hoc program that can whizz through the data without constantly having to figure out what to do.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
0

If you are using a highly multithreaded code, I would recommend you to take a look at the upcoming Task Parallel Library (TPL) for .NET and the Parallel Pattern Library (PPL) for native C++ applications. That will save you a lot of issues with thread/dead lockíng and all other issues that you would spend a lot of time digging into and solving for yourself. For my self, I truly believe that the memory management in the managed world will be more efficient and beat the native code in the long term.

Magnus Johansson
  • 28,010
  • 19
  • 106
  • 164
0

If much of your code is in C why not keep it? In principal and by design it's obvious that C is faster. They may close the gap over time but they always have more level os indirection and "safety". C is fast because it's "unsafe". Just think about bound checking. Interfacing to C is supported in every langauge. And so I can not see why one would not like to just wrap the C code up if it's still working and use it in whatever language you like

Friedrich
  • 5,916
  • 25
  • 45
0

I would consider what everyone else uses - not the folks on this site, but the folks who write the same kind of massively parallel, or super high-performance applications.

I find they all write their code in C/C++. So, just for this fact alone (ie. regardless of any speed issues between the languages), I would go with C/C++. The tools they use and have developed will be of much more use to you if you're writing in the same language.

Aside from that, I've found C# apps to have somewhat less than optimal performance in some areas, multithreading is one. .NET will try to keep you safe from thread problems (probably a good thing in most cases), but this will cause your specific case problems (to test: try writing a simple loop that accesses a shared object using lots of threads. Run that on a single core PC and you get better performance than if you run it on a multiple core box - .net is adding its own locks to make sure you don't muck it up)(I used Jon Skeet's singleton benchmark. The static lock on took 1.5sec on my old laptop, 8.5s on my superfast desktop, the lock version is even worse, try it yourself)

The next point is that with C you tend to access memory and data directly - nothing gets in the way, with C#/Java you will use some of the many classes that are provided. These will be good in the general case, but you're after the best, most efficient way to access this (which, for your case is a big deal with multi-terabytes of data, those classes were not designed with those datasets in mind, they were designed for the common cases everyone else uses), so again, you would be safer using C for this - you'll never get the GC getting clogged up by a class that creates new strings internally when you read a couple of terabytes of data if you write it in C!

So it may appear that C#/Java can give you benefits over a native application, but I think you'll find those benefits are only realised for the kind of line-of-business applications that are commonly written.

gbjbaanb
  • 51,617
  • 12
  • 104
  • 148
  • 1
    All the scientific (i.e. floating point & non-trivial to parallelize) supercomputing code that I know of is written in either FORTRAN (still dominant), and/or C. Ref: Intel and Portland Group offer FORTRAN, C, and C++ compilers. AFAIK they are the major commercial compilers still in business. – mctylr Apr 09 '09 at 19:17
  • @gbjbaanb 'I've found C# apps to have somewhat less than optimal performance' you will find ANY language to expose suboptimal performance - if used in a suboptimal way. This generalization does not hold, really! – user492238 Feb 03 '12 at 14:49
0

Note that for heavy computations there is a great advantage in having tight loops which can fit in the CPU's first level cache as it avoids having to go to slower memory repeatedly to get the instructions.

Even for level two cache a large program like Quake IV gets a 10% performance increase with 4 Mb level 2 cache versus 1 Mb level 2 cache - http://www.tomshardware.com/reviews/cache-size-matter,1709-5.html

For these tight loops C is most likely the best as you have the most control of the generated machine code, but for everything else you should go for the platform with the best libraries for the particular task you need to do. For instance the netlib libraries are reputed to have very good performance for a very large set of problems, and many ports to other languages are available.

Thorbjørn Ravn Andersen
  • 73,784
  • 33
  • 194
  • 347
  • netlib in its original state is not intended to show good performance. Rather the platform specific (vendor) implementations (MKL, ACMD, ATLAS) - commonly based on netlib, are run fast and are mostly utilized. – user492238 Feb 03 '12 at 14:57
0

If every percentage will really save you tens of thousands of dollars, then you should bring in a domain expert to help with the project. Well designed and written code with performance considered at the initial stages may be an order of magnitude faster, saving you 90%, or $900,000. I recently found a subtle flaw in some code that sped up a process by over 100 times. A colleague of mine found an algorithm that was running in O(n^3) that he re-wrote to make it O(N log n). This tends to be where the huge performance saving are.

If the problem is so simple that you are certain that a better algorithm cannot be employed giving you significant savings, then C is most likely your best language.

Stephen Nutt
  • 3,258
  • 1
  • 21
  • 21
0

The most important things are already said here. I would add:

The developer utilizes a language which the compiler(s) utilize(s) to generate machine instructions which the processor(s) utilize(s) to use system resources. A program will be "fast" when ALL parts of the chain perform optimally.

So for the "best" language choice:

  • take that language which you are best able to control and
  • which is able to instruct the compiler sufficiently to
  • generate nearly optimal machine code so that
  • the processor on the target machine is able to utilize processing resources optimally.

If you are not a performance expert you will have a hard time to archieve 'peak performance' within ANY language. Possibly C++ still provides the most options to control the machine instructions (especially SSE extensions a.s.o).

I suggest to orient on the well known 80:20 rule. This is fairly well true for all: the hardware, the languages/platforms and the developer efforts.

Developers have always relied on the hardware to fix all performance issues automatically due to an upgrade to a faster processor f.e.. What might have worked in the past will not work in the (nearest) future. The developer now has the responsibility to structure her programs accordingly for parallelized execution. Languages for virtual machines and virtual runtime environments will show some advantage here. And even without massive parallelization there is little to no reason why C# or Java shouldn't succeed similar well as C++.

@Edit: See this comparison of C#, Matlab and FORTRAN, where FORTRAN does not win alone!

user492238
  • 4,094
  • 1
  • 20
  • 26
-6

Ref; "My benchmarks so far show Java and C are about on par with each other"

Then your benchmarks are severely flawed...

C will ALWAYS be orders of magnitudes faster then both C# and Java unless you do something seriously wrong...!

PS! Notice that this is not an attempt to try to bully neither C# nor Java, I like both Java and C#, and there are other reasons why you would for many problems choose either Java or C# instead of C. But neither Java nor C# would in a correct written tests NEVER be able to perform with the same speed as C...

Edited because of the sheer number of comments arguing against my rhetoric

Compare these two buggers...

C#

public class MyClass
{
   public int x;

   public static void Main()
   {
      MyClass[] y = new MyClass[1000000];
      for( int idx=0; idx < 1000000; idx++)
      {
          y[idx] = new MyClass();
          y[idx].x = idx;
      }
   }
}

against this one (C)

struct MyClass
{
   int x;
}

void Main()
{
   MyClass y[1000000];
   for( int idx = 0; idx < 1000000; idx++)
   {
      y[idx].x = idx;
   }
}

The C# version first of all needs to store its array on the heap. The C version stores the array on the stack. To store stuff on the stack is merely changing the value of an integer value while to store stuff on the heap means finding a big enough chunk of memory and potentially means traversing the memory for a pretty long time.

Now mostly C# and Java allocates huge chunks of memory which they keep on spending till it's out which makes this logic execute faster. But even then to compare this against changing the value of an integer is like an F16 against an oil tanker speedwise...

Second of all in the C version since all those objects are already on the stack we don't need to explicitly create new objects within the loop. Yet again for C# this is a "look for available memory operation" while the C version is a ZIP (do nothing operation)

Third of all is the fact that the C version will automatically delete all these objects when they run out of scope. Yet again this is an operation which ONLY CHANGES THE VALUE OF AN INTEGER VALUE. Which would on most CPU architectures take between 1 and 3 CPU cycles. The C# version doesn't do that, but when the Garbage Collector kicks in and needs to collect those items my guess is that we're talking about MILLIONS of CPU cycles...

Also the C version will instantly become x86 code (on an x86 CPU) while the C# version would first become IL code. Then later when executed it would have to be JIT compiled, which probably alone takes orders of magnitudes longer time then only executing the C version.

Now some wise guy could probably execute the above code and measure CPU cycles. However that's basically no point at all in doing because mathematically it's proven that the Managed Version would probably take several million times the number of CPU cycles as the C version. So my guess is that we're now talking about 5-8 orders of magnitudes slower in this example. And sure, this is a "rigged test" in that I "looked for something to prove my point", however I challenge those that commented badly against me on this post to create a sample which does NOT execute faster in C and which also doesn't use constructs which you normally never would use in C due to "better alternatives" existing.

Note that C# and Java are GREAT languages. I prefer them over C ANY TIME OF THE DAY. But NOT because they're FASTER. Because they are NOT. They are ALWAYS slower then C and C++. Unless you've coded blindfolded in C or C++...

Edit;

C# of course have the struct keyword, which would seriously change the speed for the above C# version, if we changed the C# class to a value type by using the keyword struct instead of class. The struct keyword means that C# would store new objects of the given type on the stack - which for the above sample would increase the speed seriously. Still the above sample happens to also feature an array of these objects.

Even though if we went through and optimized the C# version like this, we would still end up with something several orders of magnitudes slower then the C version...

A good written piece of C code will ALWAYS be faster then C#, Java, Python and whatever-managed-language-you-choose...

As I said, I love C# and most of the work I do today is C# and not C. However I don't use C# because it's faster then C. I use C# because I don't need the speed gain C gives me for most of my problems.

Both C# and Java is though ridiculously slower then C, and C++ for that matter...

Thomas Hansen
  • 5,523
  • 1
  • 23
  • 28
  • Do you have a reference? Once Java/C# code gets JIT'd into native machine-code, I can think of no reason for it to be "orders of magnitude" slower than machine-code compiled from C source. – Blorgbeard Apr 08 '09 at 09:00
  • same; writing code that compiles for the common case could potentially outperform the compile-time c strategy. – gatoatigrado Apr 08 '09 at 09:18
  • simply because it has to cater for all eventualities - it can;t let you write crappy, memory-leaking, thread-unsafe code. C can. So C doesn't have to have the same kind of safety net, and obviously, all that checking and 'make safe' stuff means it won't be quite as fast as C can be. – gbjbaanb Apr 08 '09 at 10:07
  • obviously, for most applications this really doesn't matter, but for this particular application, it sounds like it will. – gbjbaanb Apr 08 '09 at 10:07
  • -1: Poor/naïve C will easily be out performed by good .NET/JVM code. If memory allocations dominate the runtime, even good C code maybe out performed (under GC allocations are extremely fast). – Richard Apr 08 '09 at 10:27
  • -1 for ridiculous over statement."ALWAYS be orders of magnitude"? – ShuggyCoUk Apr 08 '09 at 12:27
  • 3
    -1: Faster? Usually. By "orders of magnitude"? Not in your wildest dreams. – Juliet Apr 08 '09 at 13:28
  • See http://blogs.msdn.com/ricom/archive/2005/05/19/420158.aspx. It took 5 unmanaged versions and a bug fix to be as fast as the first unoptimized C# version. Only after 6 unmanaged optimizations did C++ beat C#, and it wasn't by orders of magnitude (yes, I'm aware C++ isn't C). – Grant Wagner Apr 08 '09 at 17:16
  • And those C# times included the CLR startup time, which would probably be irrelevant in the program described in the original question. – Grant Wagner Apr 08 '09 at 17:17
  • C# and Java are *MANAGED* Languages. They both rely on Garbage Collectors. They both need to box and unbox value types. And they have no concept of storing stuff on the stack. To compare them speedwise against C is like comparing an F16 against an oil tanker. Ridicilous...! Read books...! – Thomas Hansen Apr 09 '09 at 08:02
  • You really aren't helping yourself. Try changing class to struct in the c# example.... second *current* jvm implementations are capable of doing this transparently for you in certain situations, the CLR is heading that way too. Most real world apps are not normally bound by this anyway – ShuggyCoUk Apr 10 '09 at 18:43
  • Incidentally I speak as someone who does sometimes jump through some hoops in c# on my hot path to avoid allocation on the heap, it's not that hard. saying orders of magnitude is just plain wrong, it's hyperbole which you shouldn't be surprised to see shot down by rational sorts that abound here – ShuggyCoUk Apr 10 '09 at 18:46
  • 2
    you really should consider deleting this one and getting your peer pressure badge. "And they have no concept of storing stuff on the stack" I suggest you look at c# structs, stackalloc and escape analysis...perhaps do some of that reading you suggest others do. – ShuggyCoUk Apr 11 '09 at 08:21
  • @ShuggyCoUk - You're right. To use value types here would help, and is probably right. But the above samples would still be orders of magnitudes slower. In the C sample we can easily *COUNT* the CPU cycles. And comparing that against our estimate for the C# version would still be a slaughterhouse... – Thomas Hansen Apr 11 '09 at 09:52
  • 3
    If you think you can count cpu cycle by looking at code these days you are orely mistaken. I suggest further reading on modern pipelined super scalar cpu's, multi level caching and the compiler techniques used to work with them. Have you even bothered to benchmark? – ShuggyCoUk Apr 12 '09 at 08:14
  • Also everytime you edit you just show your lack of knowledge, c# lets you stack alloc which gets you a functionally identical program. The only time java/c# will be killed by languages like c is when the initial start up time matters. the OP is clearly not in that situation. – ShuggyCoUk Apr 12 '09 at 08:17
  • 1
    -1: I've compared, and easily found cases that C# runs much faster than C BECAUSE of GC. Because in C you have to allocate / free memory one by one, but GC runs batch, extremely optimized operations. Try to run similar codes (not your sample, although I don't think that you sample can be any different). And I also doubt that you know the meaning of "orders of magnitude". – Iravanchi Feb 08 '10 at 10:16
  • 4
    +1 It's something very brave to hold your point against the horde who simply don't understand C. – Andrei Ciobanu Mar 21 '10 at 11:58