1

We have a file I/O bottleneck. We have a directory which contains lots of JPEG files, and we want to read them in in real time as a movie. Obviously this is not an ideal format, but this is a prototype object tracking system and there is no possibility to change the format as they are used elsewhere in the code.

From each file we build a frame object which basically means having a buffered image and an explicit bytebuffer containing all of the information from the image.

What is the best strategy for this? The data is on a SSD which in theory has read/write rates around 400Mb/s, but in practice is reading no more than 20 files per second (3-4Mb/s) using the naive implementation:

bufferedImg = ImageIO.read(imageFile);[1]
byte[] data = ((DataBufferByte)bufferedImg.getRaster().getDataBuffer()).getData();[2]
imgBuf = ByteBuffer.wrap(data);

However, Java produces lots of possibilities for improving this. (1) CHannels. Esp File Channels (2) Gathering/Scattering. (3) Direct Buffering (4) Memory Mapped Buffers (5) MultiThreading - use a bunch of callables to access many files simultaneously. (6) Wrapping the files in a single large file. (7) Other things I haven't thought of yet.

I would just like to know if anyone has extensively tested the different options, and knows what is optimal? I assume that (3) is a must, but I would still like to optimise the reading of a single file as far as possible, and am unsure of the best strategy.

Bonus Question: In the code snipped above, when does the JVM actually 'hit the disk' and read in the contents of the file, is it [1] or is that just a file handler which `points' to the object? It kind of makes sense to lazily evaluate but I don't know how the implementation of the ImageIO class works.

phil_20686
  • 4,000
  • 21
  • 38
  • I'll go for the bonus: `ImageIO` reads all data before the `read()` method returns. This is part of the contract for a `BufferedImage`, there's no lazy evaluation. The older `java.awt.Image` uses consumer/producer pattern, but that's not really "lazy" either... – Harald K Feb 28 '14 at 13:07
  • You can have a look [here](https://github.com/haraldk/TwelveMonkeys/tree/master/sandbox/sandbox-common/src/main/java/com/twelvemonkeys/image) for examples of creating custom `DataBuffer`s backed by a `ByteBuffer` (`MappedFileBuffer` and `MappedImageFactory`). Not sure it will help much though. I believe multithreading will be your best shot. – Harald K Feb 28 '14 at 13:12
  • Be sure to consider the possibility that your bottleneck may be the processing that Java's ImageIO class does to transform the File's bytes into a BufferedImage object. If this is the case, then changing the method you use to get the bytes off the disk may not help you optimize much. See http://stackoverflow.com/questions/11910607/java-imageio-is-insanely-slow and http://stackoverflow.com/questions/2293556/looking-for-a-faster-alternative-to-imageio and http://stackoverflow.com/questions/7726583/java-imageio-write-takes-up-to-6-seconds – Jon Quarfoth Feb 28 '14 at 13:17
  • I have never used but am aware of [RAM Drive](http://en.wikipedia.org/wiki/RAM_drive)s. If the machine(s) on which your application is running it may be worth investigating. – hmjd Feb 28 '14 at 13:33
  • 1
    Most people don't realize, but you can often get a free performance improvement from `ImageIO`, by setting [`ImageIO.setUseCache(false)`](http://docs.oracle.com/javase/6/docs/api/javax/imageio/ImageIO.html#setUseCache(boolean)) (defaults to `true`). This disables the *disk cache* used by `ImageIO` and instead uses *in memory* caching. – Harald K Feb 28 '14 at 14:41

1 Answers1

0
ImageIO.read(imageFile)

As it returns BufferedImage, I assume it will hit disk and just not file handler.

mahesh
  • 1,523
  • 4
  • 23
  • 39