Why is this "line count" program slow in Java? Using MappedByteBuffer

Question

To try MappedByteBuffer (memory mapped file in Java), I wrote a simple wc -l (text file line count) demo:

int wordCount(String fileName) throws IOException {
    FileChannel fc = new RandomAccessFile(new File(fileName), "r").getChannel();
    MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());

    int nlines = 0;
    byte newline = '\n';

    for(long i = 0; i < fc.size(); i++) {
        if(mem.get() == newline)
            nlines += 1;
    }

    return nlines;
}

I tried this on a file of about 15 MB (15008641 bytes), and 100k lines. On my laptop, it takes about 13.8 sec. Why is it so slow?

Complete class code is here: http://pastebin.com/t8PLRGMa

For the reference, I wrote the same idea in C: http://pastebin.com/hXnDvZm6

It runs in about 28 ms, or 490 times faster.

Out of curiosity, I also wrote a Scala version using essentially the same algorithm and APIs as in Java. It runs 10 times faster, which suggests there is definitely something odd going on.

Update: The file is cached by the OS, so there is no disk loading time involved.

I wanted to use memory mapping for random access to bigger files which may not fit into RAM. That is why I am not just using a BufferedReader.

`MappedByteBuffer` is the wrong thing to use, your program does not need anything but a plain `BufferedReader`. You are not using any of the advanced features of the `MappedByteBuffer` so why use it? — , Apr 02 '16 at 12:47
I was typing an answer, but the question was closed. Your code is slow because it reads byte by byte, and this is very slow. Read buffer by buffer, and the performance will increase dramatically. Using https://gist.github.com/jnizet/21341d48f631b7f10bc657e560c0f2de, for example, the time spent is 50493us. vs. 8646279us. for your original version. But I agree a BufferedInputStream would be simpler anyway. — JB Nizet, Apr 02 '16 at 12:55
@JarrodRoberson Thanks for the pointer! The file is cached by the OS, I will update the question. I wanted to use memory mapping for random access to bigger files which may not fit into RAM. — cidermole, Apr 02 '16 at 12:55
@JarrodRoberson Do you think this is reasonable to reopen, since I don't believe the question you marked provides the answer? — cidermole, Apr 02 '16 at 13:02
@JBNizet Thanks! I would accept your comment if it was an answer... I had to use `mem.get(buffer, 0, read);` to avoid a `BufferUnderflowException` towards the end of the file. Now runs in `200 ms`, or 7 times slower than C. This is more reasonable. — cidermole, Apr 02 '16 at 13:17

score 10 · Accepted Answer · answered Apr 02 '16 at 13:28

10

The code is very slow, because fc.size() is called in the loop.

JVM obviously cannot eliminate fc.size(), since file size can be changed in run-time. Querying file size is relatively slow, because it requires a system call to the underlying file system.

Change this to

    long size = fc.size();
    for (long i = 0; i < size; i++) {
        ...
    }

answered Apr 02 '16 at 13:28

apangin

92,924
10
193
247

1

Haha, I was completely wrong. This is indeed the reason. – JB Nizet Apr 02 '16 at 13:31
1

OUCH! It had to be something really stupid. Thanks! Now runs in 73 ms, or 2.6 times below the C performance. – cidermole Apr 02 '16 at 18:22
@cidermole including JVM startup? – Tagir Valeev Aug 08 '16 at 14:41
@tagir-valeev excluding JVM startup – cidermole Aug 10 '16 at 15:51

Why is this "line count" program slow in Java? Using MappedByteBuffer

1 Answers1