0

Hi: We are using Java for a multi thread application. We found bottleneck at Java I/O. Has functional programming, scala for example, had better I/O throughput? We will have many cores cpu, in that sense, business logic could be handled very fast, but I/O would be a bottleneck. Are there any good solution?

om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
user84592
  • 4,750
  • 11
  • 55
  • 91

5 Answers5

11

Since Scala runs on the Java Virtual Machine, and (under the hood) uses the Java API for I/O, switching to scala is unlikely to offer better performance than well written Java code.

As for solutions, your description of the problem is far too sketchy to recommend particular solutions.

meriton
  • 68,356
  • 14
  • 108
  • 175
  • As has been pointed out the OP's question is way too vague, and thus this sweeping generalisation may be actually quite wrong. A change in the programming model (as others have suggested for instance to an async NIO model) may a win – or it may not. – Jed Wesley-Smith Oct 16 '11 at 01:48
  • That's why I qualified with *well-written* Java code (which includes correctly using an API approapriate for the task), and said that Scala was *unlikely* to offer better performance. – meriton Oct 16 '11 at 10:02
8

Are you using or tried Java nio ( non blocking) ? Developers report upto 300% performance increase.

Java NIO FileChannel versus FileOutputstream performance / usefulness ( Please refer this as well)

Community
  • 1
  • 1
java_mouse
  • 2,069
  • 4
  • 21
  • 30
7

Usually when people complain that Java IO is slow, it is what they are doing with the IO which is slow, not the IO itself. E.g. BufferedReader reading lines of text (which is relatively slow) can read 90 MB/s with a decent CPU/HDD. You can make it much faster with memory mapped files but unless your disk drive can handle it, it won't make much real difference.

There are things you can do to improve IO performance but you quickly find that the way to get faster IO is to improve the hardware.

If you are using a Hard Drive which can sustain 100 MB/s read speed and 120 IOPS, you are going to limited by these factors and replacing the drive with an SSD which does 500 MB/s and 80,000 IOPS is going to be faster.

Similarly, if you are using a 100 Mb/s network, you might only get 12 MB/s, on a 1 Gb/s network you might get 110 MB/s and on a 10 Gig-E network you might be lucky to get 1 GB/s.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
5

If you are performing many tiny I/O operations, then coalescing them into one large I/O operation could greatly speed up your code. Functional programming techniques tend to make data collection and conversion operations easier to write (e.g. you can store items for pending output in a list, and use map to apply an item-to-text or item-to-binary converter to them). Otherwise, no, functional programming techniques don't overcome inherently slow channels. If raw I/O speed is limiting, in Java and elsewhere, and you have enough hardware threads available, you should have one top priority thread for each independent I/O channel, and have it perform only I/O (no data conversion, nothing). That will maximize your I/O rate, and then you can use the other threads to do conversions and business logic and such.

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
2

One question is whether you have unlimited time to develop your application or not. If you have unlimited time, then the Java program and Scala programs will have the same performance since you can write Scala programs that will produce exactly the same bytecode as Java.

But, if you have unlimited time, why not develop in C (or assembler)? You'd get better performance.

Another is how sophisticated your IO code is. If it is something quite trivial, then Scala will probably not provide much benefit, as there is not enough "meat" to utilize its features.

I think if you have limited time and a complex IO codebase, the a Scala based solution may be faster. The reason Scala opens the door to many idioms that in Java are just too laborious to write, so people avoid them and pay the price later.

For example, executing a calculation over a collection of data in parallel is done in Java with ForkJoinPool, which you have to create, then create a class wrapping the calculation, break it for each item and submit to the pool.

In Scala: collection.par.map(calculation). Writing this is much faster than Java, so you just do it and have spare time to tackle other issues.

From personal experience, I have a related story. I read in a blog article that BuildR, a ruby based build tool was two times faster than Maven for a simple build. Considering that Ruby is about 20 times slower than Java, I was surprised. So I profiled Maven. It turned out it did apx 1000 times parsing of the same XML file. Now of course with careful design, they could have reduced that to just one time. But I guess the reason they did not is because the strait-forward approach in Java led to a design to complex to change after. With BuildR, the design was simpler and performance better. In Scala, you get the feeling of programming in a dynamic language while still being on par with Java in terms of performance.

UPDATE: Thinking about it more, there are some areas in Scala which will give greater performance than Java (again, assuming the IO bottleneck is because of the code that wraps the IO operations, not the reading/writing of bytes): * Lazy arguments and values - can push spending CPU cycles to when they are actually required * Specialization - allows to tell the compiler to create copies of generic data structures for the native types, thus avoiding boxing, unboxing and casting.

IttayD
  • 28,271
  • 28
  • 124
  • 178