32

I have a problem, I need to compare two inputstreams fast.

Today I have a function like this:

private boolean isEqual(InputStream i1, InputStream i2) throws IOException {

    try {
        // do the compare
        while (true) {
            int fr = i1.read();
            int tr = i2.read();

            if (fr != tr)
                return false;

            if (fr == -1)
                return true;
        }

    } finally {
        if (i1 != null)
            i1.close();
        if (i2 != null)
            i2.close();
    }
}

But it's really slow. I want to use buffered reads but have not come up with a good way of doing it.

Some extra stuff that makes it harder:

  • I don't want to read one of the input streams into memory (the whole one)
  • I don't want to use a third party library

I need a practial solution - code! :)

dacwe
  • 43,066
  • 12
  • 116
  • 140
  • I don't think you can compare anything without reading it into memory. Do you actually mean reading the *whole inputstream* into memory, meaning reading a fixed number of bytes is ok? – Patrick Nov 22 '10 at 13:36
  • I meant reading the whole inputstream into memory is not an option – dacwe Nov 22 '10 at 13:38

4 Answers4

99

By far my favorite is to use the org.apache.commons.io.IOUtils helper class from the Apache Commons IO library:

IOUtils.contentEquals( is1, is2 );
Abdull
  • 26,371
  • 26
  • 130
  • 172
Snicolas
  • 37,840
  • 15
  • 114
  • 173
15

Something like this may do:

private static boolean isEqual(InputStream i1, InputStream i2)
        throws IOException {

    ReadableByteChannel ch1 = Channels.newChannel(i1);
    ReadableByteChannel ch2 = Channels.newChannel(i2);

    ByteBuffer buf1 = ByteBuffer.allocateDirect(1024);
    ByteBuffer buf2 = ByteBuffer.allocateDirect(1024);

    try {
        while (true) {

            int n1 = ch1.read(buf1);
            int n2 = ch2.read(buf2);

            if (n1 == -1 || n2 == -1) return n1 == n2;

            buf1.flip();
            buf2.flip();

            for (int i = 0; i < Math.min(n1, n2); i++)
                if (buf1.get() != buf2.get())
                    return false;

            buf1.compact();
            buf2.compact();
        }

    } finally {
        if (i1 != null) i1.close();
        if (i2 != null) i2.close();
    }
}
aioobe
  • 413,195
  • 112
  • 811
  • 826
  • @dacwe, I can guarentee its slower than the solution I provided. ;) – Peter Lawrey Nov 22 '10 at 13:42
  • 1
    `allocateDirect` should give you a "direct bytebuffer" (with a native low-level implementation) which is supposed to be faster than an ordinary bytebuffer. – aioobe Nov 22 '10 at 13:43
  • @Peter Lawrey, are you sure? You're using a byte array managed by the JVM, with a direct bytebuffer you get one step closer to the silicon :-) – aioobe Nov 22 '10 at 13:44
  • It makes the read() faster, however the get() is slower than accessing a byte[] as its a JNI call. – Peter Lawrey Nov 22 '10 at 13:45
  • In any case, for a file the buffer size is more important here. note: a direct memory buffer will always use a multiple of a page of memory even if you can't use it. (4KB on most machines). – Peter Lawrey Nov 22 '10 at 13:47
  • If that's the bottleneck, he could easily `slice()` the larger buffer and use the `compareTo` method. – aioobe Nov 22 '10 at 13:48
  • You can also use a 64K buffer and compare getLong() instead of one byte at a time. – Peter Lawrey Nov 22 '10 at 15:27
  • 1
    Using a DirectByteBuffer will only add value if the underling InputStreams are FileInputStreams and able to move data directly from file into the DirectByteBuffer without the memory having to go into jvm managed memory. Lets say this is a "normal" InputStream backed by some thing other than a File. In that case you are moving the memory from jvm (source InputStream) over into native (DirectByteBuffer) then back into the jvm (when calling buf.get()) for the purposes of the comparison. – Brett Okken Jul 01 '13 at 15:12
  • This code seems incorrect. If one channel is emitting bytes very quickly, while another very slowly, one channel would hit EOF first while there are still characters in the other channel waiting to be read. This code, however, would assume that the buffers have unequal lengths. – yiding Mar 14 '16 at 23:31
  • What are you talking about? Unless the slow stream is actually closed, the read operation would either block or return zero (no bytes could be read immediately). Both cases are handled just fine. – aioobe Mar 15 '16 at 07:28
10

Using buffered reads is just a matter of wrapping the InputStreams with BufferedInputStreams. However you are likely to get the best performance reading large blocks at a time.

private boolean isEqual(InputStream i1, InputStream i2) throws IOException {
    byte[] buf1 = new byte[64 *1024];
    byte[] buf2 = new byte[64 *1024];
    try {
        DataInputStream d2 = new DataInputStream(i2);
        int len;
        while ((len = i1.read(buf1)) > 0) {
            d2.readFully(buf2,0,len);
            for(int i=0;i<len;i++)
              if(buf1[i] != buf2[i]) return false;
        }
        return d2.read() < 0; // is the end of the second file also.
    } catch(EOFException ioe) {
        return false;
    } finally {
        i1.close();
        i2.close();
    }
}
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • So, how do I do that - e.g. a practial solution? – dacwe Nov 22 '10 at 13:32
  • @dacwe: Allocate two byte buffers `byte[] buf1 = new byte[BlockSize]; byte[] buf2 = new byte[BlockSize];` and compare buf1 and buf2 after you read into those two buffers from i1 and i2. – Patrick Nov 22 '10 at 13:34
  • @patrick, Peter Lawrey: Well, that's not that easy.. :) sfussenegger thought that he had it, but he's also wrong. – dacwe Nov 22 '10 at 13:37
  • @dacwe, what's not that easy? – Peter Lawrey Nov 22 '10 at 13:39
  • The readFully ensures the same amount of data is read from both streams, this makes the comparison simpler and avoid the need to pack/compact any left over data. – Peter Lawrey Nov 22 '10 at 13:44
  • You may want to return false on `EOFException` (thrown when `readFully` hits EOF), but allow an `IOException` to bubble out of the method (because some other read failure occurred). – Jonathan Nov 22 '10 at 13:50
  • @jonathan, I wouldn't but the OP might as that is how the example is given. Thanks. – Peter Lawrey Nov 22 '10 at 15:27
  • 2
    Wow i used your solution an replaced the for loop with a Arrays#equals. For a 200mb File this is twice as fast. I think its because the method is a HotSpotIntrinsicCandidate. – kai Nov 29 '19 at 13:45
2

why not simply wrap both streams at the very beginning of your method:

i1 = new BufferedInputStream(i1);
i2 = new BufferedInputStream(i2);

Alternatively, you could simply try reading both streams into a buffer:

public static boolean equals(InputStream i1, InputStream i2, int buf) throws IOException {
    try {
        // do the compare
        while (true) {
            byte[] b1 = new byte[buf];
            byte[] b2 = new byte[buf];

            int length = i1.read(b1);
            if (length == -1) {
                return i2.read(b2, 0, 1) == -1;
            }

            try {
                StreamUtils.readFully(i2, b2, 0, length);
            } catch (EOFException e) {
                // i2 is shorter than i1
                return false;
            }

            if (!ArrayUtils.equals(b1, b2, 0, length)) {
                return false;
            }
        }
    } finally {
        // simply close streams and ignore (log) exceptions
        StreamUtils.close(i1, i2);
    }
}

// StreamUtils.readFully(..) 
public static void readFully(InputStream in, byte[] b, int off, int len) throws EOFException, IOException {
    while (len > 0) {
        int read = in.read(b, off, len);
        if (read == -1) {
            throw new EOFException();
        }
        off += read;
        len -= read;
    }
}

// ArrayUtils.equals(..)
public static boolean equals(byte[] a, byte[] a2, int off, int len) {
    if (off < 0 || len < 0 || len > a.length - off || len > a2.length - off) {
        throw new IndexOutOfBoundsException();
    } else if (len == 0) {
        return true;
    }

    if (a == a2) {
        return true;
    }
    if (a == null || a2 == null) {
        return false;
    }

    for (int i = off; i < off + len; i++) {
        if (a[i] != a2[i]) {
            return false;
        }
    }

    return true;
}

EDIT: I've fixed my implementation now. That's how it looks like without DataInputStream or NIO. Code is available at GitHub or from Sonatype's OSS Snapshot Repository Maven:

<dependency>
  <groupId>at.molindo</groupId>
  <artifactId>molindo-utils</artifactId>
  <version>1.0-SNAPSHOT</version>
</dependency>
sfussenegger
  • 35,575
  • 15
  • 95
  • 119
  • 1
    `read` method is not specified for that (could return not reading full input!) – dacwe Nov 22 '10 at 13:36
  • Also, is it predictable what contains say `b1[1023]` if `length=100`? – khachik Nov 22 '10 at 13:38
  • I couldn't find "Arrays.equals(b1, b2, 0, length)" – Peter Lawrey Nov 22 '10 at 13:40
  • 1
    @dacwe I've noticed this myself. That's why I added the FIXME comment - work in progress ;) @khachik What do you mean by atomic reads? @peter Arrays.equals(..) actually is a private utility library I'm using, my fault, though it was in java.util.Arrays ... gonna add it – sfussenegger Nov 22 '10 at 13:47