21

I'm trying to write a pretty heavy duty math-based project, which will parse through about 100MB+ data several times a day, so, I need a fast language that's pretty easy to use. I would have gone with C, but, getting a large project done in C is very difficult, especially with the low level programming getting in your way. So, I was about python or java. Both are well equiped with OO features, so I don't mind that. Now, here are my pros for choosing python:

  • Very easy to use language
  • Has a pretty large library of useful stuff
  • Has an easy to use plotting library

Here are the cons:

  • Not exactly blazing
  • There isn't a native python neural network library that is active
  • I can't close source my code without going through quite a bit of trouble
  • Deploying python code on clients computers is hard to deal with, especially when clients are idiots.

Here are the pros for choosing Java:

  • Huge library
  • Well supported
  • Easy to deploy
  • Pretty fast, possibly even comparable to C++
  • The Encog Neural Network Library is really active and pretty awesome
  • Networking support is really good
  • Strong typing

Here are the cons for Java:

  • I can't find a good graphing library like matplotlib for python
  • No built in support for big integers, that means another dependency (I mean REALLY big integers, not just math.BigInteger size)
  • File IO is kind of awkward compared to Python
  • Not a ton of array manipulating or "make programming easy" type of features that python has.

So, I was hoping you guys can tell me what to use. I'm equally familiar with both languages. Also, suggestions for other languages is great too.

EDIT: WOW! you guys are fast! 30 mins at 10 responses!

Dhaivat Pandya
  • 219
  • 1
  • 2
  • 4
  • 3
    What do you mean by "REALLY big integers"? java.math.BigInteger will grow to whatever size necessary to store the numbers you're dealing with (at the price of a somewhat awkward syntax though since Java doesn't have operator overloading) – Luke Hutteman Jan 21 '11 at 13:55
  • 2
    Have you considered using a "real" [computer algebra system](http://en.wikipedia.org/wiki/Comparison_of_computer_algebra_systems)?It doesn't have to be Mathematica (expensive!), there are a lot of cheaper, or even free choices. – Bart Kiers Jan 21 '11 at 13:55
  • Just wondering about "I mean REALLY big integers, not just math.BigInteger size" comment. Why do you think java.math.BigInteger numbers won't do it? It seems to me that before you start hitting its limits, you will have problems with memory already. – Peter Štibraný Jan 21 '11 at 13:56
  • Creating an executable + required libraries is a piece of cake with e.g. cx_Freeze. And the result can't be deciphered unless someone skilled is really out to do it (in which case you're screwed anyway, pretty much regardless of the language) - not to mention the usual objections to "I wanna hide my sourcez". –  Jan 21 '11 at 13:58
  • You should list you priorities: is closing the source code a must? How much time do you have? By the way: Python _IS_ strong typed, but not statically. – Iacopo Jan 21 '11 at 13:58
  • py2exe can make deployment on clients machines very easy. – Mike Axiak Jan 21 '11 at 14:00
  • have you seen JFreeChart for graph plotting in Java? (http://www.jfree.org/jfreechart/) - it's pretty good – mikera Jan 21 '11 at 14:01
  • It's fairly trivial to decompile java byte code, so they are pretty even on that front, IMHO. – Matthew Schinckel Jan 21 '11 at 14:03
  • JFreeChart is great, it seems like. The last experience I've had with bigintegrs was a long time ago, and I remember them not actually being arbitrary size. I might be wrong, I was a newbie. Closing source code is almost a must, I have as much time as I need, but, I would like to get done as quickly as possible, the longer I take, the more loss I have to take on. – Dhaivat Pandya Jan 21 '11 at 14:13
  • I've decided that I'm going with the best of both worlds, python for collecting the data and then parsing it and giving to the java class with jython. Thanks a lot everyone. – Dhaivat Pandya Jan 21 '11 at 16:10

8 Answers8

19

Java will usually be quicker to run (don't take this as an absolute truth), but slower to write.

Python is the opposite. Since libraries such as SciPy and NumPy already exist, which are built upon fast C code, I'd suggest going with Python if you prefer to go the "speedier" way in terms of code writing. Unless fundamental blocks for your application are missing in SciPy + NumPy, and those exist for Java.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
darioo
  • 46,442
  • 10
  • 75
  • 103
11

Why not get the best of both worlds by taking advantage of multiple languages on the JVM:

  • Write the performance intensive parts in Java (or use existing great Java libraries)
  • Use Jython to write the user interface / application in Python and call the Java code when needed
mikera
  • 105,238
  • 25
  • 256
  • 415
3

NumPy usually put quite some kick into the computational force of Python. It's the defacto standard for any real number crunching in python. I don't have any real experience with Java in this field so Im not really qualified to answer this question for you.

Exelian
  • 5,749
  • 1
  • 30
  • 49
2

Deploying python code on clients computers is hard to deal with, especially when clients are idiots I thinks this is a problem with Java too.

I can't find a good graphing library like matplotlib for python Have you tried JFreechart http://www.jfree.org/jfreechart/

Also, suggestions for other languages is great too I would suggest Groovy, it looks a bit like Python and is a JVM language that integrates well with Java.

You did not ask this directly, but I will recommend you the Apache Commons Math library for Math Java computations.

Navi
  • 8,580
  • 4
  • 34
  • 32
1

If those are the choices, then Java should be the faster for math intensive work. It is compiled (although yes it is still running byte code).

Exelian mentions NumPy. There's also the SciPy package. Both are worth looking at but only really seem to give speed improvements for work with lots of arrays and vector processing. When I tried using these with NLTK for a math-intensive routine, I found there wasn't that much of a speedup.

For math intensive work these days, I'd be using C/C++ or C# (personally I prefer C# over Java although that shouldn't affect your decision). My first employer out of univ. paid me to use Fortran for stuff that is almost certainly more math intensive than anything you're thinking of. Don't laugh - the Fortran compilers are some of the best for math processing on heavy iron.

winwaed
  • 7,645
  • 6
  • 36
  • 81
  • 1
    Why would anyone laugh at Fortran? It's still the lingua franca for serious scientific computing. The NIST linear algebra libraries can't be beat. – duffymo Jan 22 '11 at 02:05
  • 1
    Quite, the combination of compilers and libraries are probably the best you can get for heavy iron, however most developers (even older ones) don't seem to be aware of that, and quickly dismiss it out of hand as something from the ancient past and irrelevant in the modern world. – winwaed Jan 22 '11 at 02:56
0

What is more important for you?

If it's rapid application development, I found Python significantly easier to code for than Java - and I was just learning Python, while I had been coding on Java for years.

If it's application speed and the ability to reuse existing code, then you should probably stick with Java. It's reasonably fast and many research efforts at the moment use Java as their language of choice.

thkala
  • 84,049
  • 23
  • 157
  • 201
0

It seems Java can be really fast: http://blog.dhananjaynene.com/2008/07/performance-comparison-c-java-python-ruby-jython-jruby-groovy/
On the other hand Python is very good for doing math, and there's quite a lot of room for performance improvement if you use it correctly (I mean, with the right idioms/modules/builtin functions).

Edit: Suggestions for other languages: Haskell. It's very high level; writing it in "low level style" it can be very fast (can compare quite fairly with C) and its even better if you can make some use of its multithreading capabilities. However experience tells its never good learning to use new tools while they're needed in a project.

MattiaG
  • 241
  • 1
  • 2
  • 9
0

The Apache Commons Math picked up where JAMA left off. They are quite capable for scientific computing.

So is Python - NumPy and SciPy are excellent. I also like the fact that Python is a hybrid of object-orientation and functional programming. Functional programming is awfully handy for numerical methods.

I'd recommend using the one that you know best, but if the choice is a toss up I might lean towards Python.

duffymo
  • 305,152
  • 44
  • 369
  • 561