5

To try MappedByteBuffer (memory mapped file in Java), I wrote a simple wc -l (text file line count) demo:

int wordCount(String fileName) throws IOException {
    FileChannel fc = new RandomAccessFile(new File(fileName), "r").getChannel();
    MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());

    int nlines = 0;
    byte newline = '\n';

    for(long i = 0; i < fc.size(); i++) {
        if(mem.get() == newline)
            nlines += 1;
    }

    return nlines;
}

I tried this on a file of about 15 MB (15008641 bytes), and 100k lines. On my laptop, it takes about 13.8 sec. Why is it so slow?

Complete class code is here: http://pastebin.com/t8PLRGMa

For the reference, I wrote the same idea in C: http://pastebin.com/hXnDvZm6

It runs in about 28 ms, or 490 times faster.

Out of curiosity, I also wrote a Scala version using essentially the same algorithm and APIs as in Java. It runs 10 times faster, which suggests there is definitely something odd going on.

Update: The file is cached by the OS, so there is no disk loading time involved.

I wanted to use memory mapping for random access to bigger files which may not fit into RAM. That is why I am not just using a BufferedReader.

cidermole
  • 5,662
  • 1
  • 15
  • 21
  • Java version: OpenJDK 1.8.0 Platform: Linux 4.1.16 – cidermole Apr 02 '16 at 12:31
  • `MappedByteBuffer` is the wrong thing to use, your program does not need anything but a plain `BufferedReader`. You are not using any of the advanced features of the `MappedByteBuffer` so why use it? –  Apr 02 '16 at 12:47
  • 1
    I was typing an answer, but the question was closed. Your code is slow because it reads byte by byte, and this is very slow. Read buffer by buffer, and the performance will increase dramatically. Using https://gist.github.com/jnizet/21341d48f631b7f10bc657e560c0f2de, for example, the time spent is 50493us. vs. 8646279us. for your original version. But I agree a BufferedInputStream would be simpler anyway. – JB Nizet Apr 02 '16 at 12:55
  • @JarrodRoberson Thanks for the pointer! The file is cached by the OS, I will update the question. I wanted to use memory mapping for random access to bigger files which may not fit into RAM. – cidermole Apr 02 '16 at 12:55
  • @JarrodRoberson Do you think this is reasonable to reopen, since I don't believe the question you marked provides the answer? – cidermole Apr 02 '16 at 13:02
  • @JBNizet Thanks! I would accept your comment if it was an answer... I had to use `mem.get(buffer, 0, read);` to avoid a `BufferUnderflowException` towards the end of the file. Now runs in `200 ms`, or 7 times slower than C. This is more reasonable. – cidermole Apr 02 '16 at 13:17

1 Answers1

10

The code is very slow, because fc.size() is called in the loop.

JVM obviously cannot eliminate fc.size(), since file size can be changed in run-time. Querying file size is relatively slow, because it requires a system call to the underlying file system.

Change this to

    long size = fc.size();
    for (long i = 0; i < size; i++) {
        ...
    }
apangin
  • 92,924
  • 10
  • 193
  • 247