5

I use Java 1.5 on an embedded Linux device and want to read a binary file with 2MB of int values. (now 4bytes Big Endian, but I can decide, the format)

Using DataInputStream via BufferedInputStream using dis.readInt()), these 500 000 calls needs 17s to read, but the file read into one big byte buffer needs 5 seconds.

How can i read that file faster into one huge int[]?

The reading process should not use more than additionally 512 kb.

This code below using nio is not faster than the readInt() approach from java io.

    // asume I already know that there are now 500 000 int to read:
    int numInts = 500000;
    // here I want the result into
    int[] result = new int[numInts];
    int cnt = 0;

    RandomAccessFile aFile = new RandomAccessFile("filename", "r");
    FileChannel inChannel = aFile.getChannel();

    ByteBuffer buf = ByteBuffer.allocate(512 * 1024);

    int bytesRead = inChannel.read(buf); //read into buffer.

    while (bytesRead != -1) {

      buf.flip();  //make buffer ready for get()

      while(buf.hasRemaining() && cnt < numInts){
       // probably slow here since called 500 000 times
          result[cnt] = buf.getInt();
          cnt++;
      }

      buf.clear(); //make buffer ready for writing
      bytesRead = inChannel.read(buf);
    }


    aFile.close();
    inChannel.close();

Update: Evaluation of the answers:

On PC the Memory Map with IntBuffer approach was the fastest in my set up.
On the embedded device, without jit, the java.io DataiInputStream.readInt() was a bit faster (17s, vs 20s for the MemMap with IntBuffer)

Final Conclusion: Significant speed up is easier to achieve via Algorithmic change. (Smaller file for init)

user207421
  • 305,947
  • 44
  • 307
  • 483
AlexWien
  • 28,470
  • 6
  • 53
  • 83

3 Answers3

4

I don't know if this will be any faster than what Alexander provided, but you could try mapping the file.

    try (FileInputStream stream = new FileInputStream(filename)) {
        FileChannel inChannel = stream.getChannel();

        ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
        int[] result = new int[500000];

        buffer.order( ByteOrder.BIG_ENDIAN );
        IntBuffer intBuffer = buffer.asIntBuffer( );
        intBuffer.get(result);
    }
Michael Krussel
  • 2,586
  • 1
  • 14
  • 16
  • 2
    On PC it was the fastest solution, but on embedded without JIT it took 20s, so java io is there still fastest. Interesting... – AlexWien Apr 16 '13 at 13:05
3

You can use IntBuffer from nio package -> http://docs.oracle.com/javase/6/docs/api/java/nio/IntBuffer.html

int[] intArray = new int[ 5000000 ];

IntBuffer intBuffer = IntBuffer.wrap( intArray );

...

Fill in the buffer, by making calls to inChannel.read(intBuffer).

Once the buffer is full, your intArray will contain 500000 integers.

EDIT

After realizing that Channels only support ByteBuffer.

// asume I already know that there are now 500 000 int to read:
int numInts = 500000;
// here I want the result into
int[] result = new int[numInts];

// 4 bytes per int, direct buffer
ByteBuffer buf = ByteBuffer.allocateDirect( numInts * 4 );

// BIG_ENDIAN byte order
buf.order( ByteOrder.BIG_ENDIAN );

// Fill in the buffer
while ( buf.hasRemaining( ) )
{
   // Per EJP's suggestion check EOF condition
   if( inChannel.read( buf ) == -1 )
   {
       // Hit EOF
       throw new EOFException( );
   }
}

buf.flip( );

// Create IntBuffer view
IntBuffer intBuffer = buf.asIntBuffer( );

// result will now contain all ints read from file
intBuffer.get( result );
Alexander Pogrebnyak
  • 44,836
  • 10
  • 105
  • 121
  • I already tried that but i am stuck at " int bytesRead = inChannel.read(intBuffer);" This does not compile, I cannot pass a IntBuffer to inChannel.read(), it expoects a byteBuffer – AlexWien Apr 15 '13 at 18:30
  • The read loop isn't adequate. If it encounters premature EOF it will run forever. You should loop while `read()` returns a positive number. That tests both EOF and `hasRemaining()`. – user207421 Aug 25 '15 at 00:13
2

I ran a fairly careful experiment using serialize/deserialize, DataInputStream vs ObjectInputStream, both based on ByteArrayInputStream to avoid IO effects. For a million ints, readObject was about 20msec, readInt was about 116. The serialization overhead on a million-int array was 27 bytes. This was on a 2013-ish MacBook Pro.

Having said that, object serialization is sort of evil, and you have to have written the data out with a Java program.

Tim Bray
  • 1,653
  • 11
  • 16
  • This is interesting, I have not considered the possibility to use writeObject. writeObject internally fills a byte[] using Bits.putInt() before writing out. This could be faster than simply calling writeInt() a million of times. (The java.nio is faster on PC than java.io, because it uses DMA access to disc, which is not available on that embedded device) – AlexWien Jan 07 '15 at 16:12