17

I'm trying to figure out how to calculate the Internet Checksum in Java and its causing me no end of pain. (I'm horrible at bit manipulation.) I found a version in C# Calculate an Internet (aka IP, aka RFC791) checksum in C#. However my attempt at converting it to Java does not see to produce the correct results. Can anyone see what I'm doing wrong? I suspect a data type issue.

public long getValue() {
    byte[] buf = { (byte) 0xed, 0x2A, 0x44, 0x10, 0x03, 0x30};
    int length = buf.length;
    int i = 0;

    long sum = 0;
    long data = 0;
    while (length > 1) {
        data = 0;
        data = (((buf[i]) << 8) | ((buf[i + 1]) & 0xFF));

        sum += data;
        if ((sum & 0xFFFF0000) > 0) {
            sum = sum & 0xFFFF;
            sum += 1;
        }

        i += 2;
        length -= 2;
    }

    if (length > 0) {
        sum += (buf[i] << 8);
        // sum += buffer[i];
        if ((sum & 0xFFFF0000) > 0) {
            sum = sum & 0xFFFF;
            sum += 1;
        }
    }
    sum = ~sum;
    sum = sum & 0xFFFF;
    return sum;
}
Community
  • 1
  • 1
chotchki
  • 4,258
  • 5
  • 34
  • 55

3 Answers3

18

Edited to apply comments from @Andy, @EJP, @RD et al and adding extra test cases just to be sure.

I've used a combination of @Andys answer (correctly identifying the location of the problem) and updated the code to include the unit tests provided in the linked answer along with a verified message checksum additional test case.

First the implementation

package org.example.checksum;

public class InternetChecksum {

  /**
   * Calculate the Internet Checksum of a buffer (RFC 1071 - http://www.faqs.org/rfcs/rfc1071.html)
   * Algorithm is
   * 1) apply a 16-bit 1's complement sum over all octets (adjacent 8-bit pairs [A,B], final odd length is [A,0])
   * 2) apply 1's complement to this final sum
   *
   * Notes:
   * 1's complement is bitwise NOT of positive value.
   * Ensure that any carry bits are added back to avoid off-by-one errors
   *
   *
   * @param buf The message
   * @return The checksum
   */
  public long calculateChecksum(byte[] buf) {
    int length = buf.length;
    int i = 0;

    long sum = 0;
    long data;

    // Handle all pairs
    while (length > 1) {
      // Corrected to include @Andy's edits and various comments on Stack Overflow
      data = (((buf[i] << 8) & 0xFF00) | ((buf[i + 1]) & 0xFF));
      sum += data;
      // 1's complement carry bit correction in 16-bits (detecting sign extension)
      if ((sum & 0xFFFF0000) > 0) {
        sum = sum & 0xFFFF;
        sum += 1;
      }

      i += 2;
      length -= 2;
    }

    // Handle remaining byte in odd length buffers
    if (length > 0) {
      // Corrected to include @Andy's edits and various comments on Stack Overflow
      sum += (buf[i] << 8 & 0xFF00);
      // 1's complement carry bit correction in 16-bits (detecting sign extension)
      if ((sum & 0xFFFF0000) > 0) {
        sum = sum & 0xFFFF;
        sum += 1;
      }
    }

    // Final 1's complement value correction to 16-bits
    sum = ~sum;
    sum = sum & 0xFFFF;
    return sum;

  }

}

Then the unit test in JUnit4

package org.example.checksum;

import org.junit.Test;

import static junit.framework.Assert.assertEquals;

public class InternetChecksumTest {
  @Test
  public void simplestValidValue() {
    InternetChecksum testObject = new InternetChecksum();

    byte[] buf = new byte[1]; // should work for any-length array of zeros
    long expected = 0xFFFF;

    long actual = testObject.calculateChecksum(buf);

    assertEquals(expected, actual);
  }

  @Test
  public void validSingleByteExtreme() {
    InternetChecksum testObject = new InternetChecksum();

    byte[] buf = new byte[]{(byte) 0xFF};
    long expected = 0xFF;

    long actual = testObject.calculateChecksum(buf);

    assertEquals(expected, actual);
  }

  @Test
  public void validMultiByteExtrema() {
    InternetChecksum testObject = new InternetChecksum();

    byte[] buf = new byte[]{0x00, (byte) 0xFF};
    long expected = 0xFF00;

    long actual = testObject.calculateChecksum(buf);

    assertEquals(expected, actual);
  }

  @Test
  public void validExampleMessage() {
    InternetChecksum testObject = new InternetChecksum();

    // Berkley example http://www.cs.berkeley.edu/~kfall/EE122/lec06/tsld023.htm
    // e3 4f 23 96 44 27 99 f3
    byte[] buf = {(byte) 0xe3, 0x4f, 0x23, (byte) 0x96, 0x44, 0x27, (byte) 0x99, (byte) 0xf3};

    long expected = 0x1aff;

    long actual = testObject.calculateChecksum(buf);

    assertEquals(expected, actual);
  }

  @Test
  public void validExampleEvenMessageWithCarryFromRFC1071() {
    InternetChecksum testObject = new InternetChecksum();

    // RFC1071 example http://www.ietf.org/rfc/rfc1071.txt
    // 00 01 f2 03 f4 f5 f6 f7
    byte[] buf = {(byte) 0x00, 0x01, (byte) 0xf2, (byte) 0x03, (byte) 0xf4, (byte) 0xf5, (byte) 0xf6, (byte) 0xf7};

    long expected = 0x220d;

    long actual = testObject.calculateChecksum(buf);

    assertEquals(expected, actual);

  }

}
Gary
  • 7,167
  • 3
  • 38
  • 57
  • Thanks for the additional unit tests. I was getting conflicting answers about whether my code was incorrect due to some bad tests. – chotchki Nov 06 '10 at 20:40
  • 1
    @chotchki I've edited the answer to include comments from @Andy et al – Gary Nov 07 '10 at 09:08
  • Why does `calculateChecksum` return a long value? UDP, TCP, and IPv4 checksums take two bytes. So int (at most 4 bytes) should be enough – Maksim Dmitriev May 13 '15 at 19:04
  • 1
    The code was modified from the example in RFC1071 Section 4.1 which used a long as its sum. At the time the RFC was written a long was probably 32 bits. – Gary May 13 '15 at 21:13
  • Why do you need to apply `0xFF00` to `buf[i]` and `0xFF` to `buf[i+1]`? For example, the two bytes `{ 64, 17 }` are paired. There are their binary values: `01000000 00010001`. So if I apply the left shift operator to 64 eight times, I'll get `01000000 0000000`. The result won't change after ANDing with `0xFF00`. I can say the same about 17 and `0xFF`. And why did you use `|` instead of `+`? – Maksim Dmitriev May 14 '15 at 17:35
  • 1
    Much of that was to do with avoiding type promotion in Java. Have a look at the other answers for more details (@Andy is now @Andrey Balaguta) – Gary May 14 '15 at 19:24
12

A much shorter version is the following:

long checksum(byte[] buf, int length) {
    int i = 0;
    long sum = 0;
    while (length > 0) {
        sum += (buf[i++]&0xff) << 8;
        if ((--length)==0) break;
        sum += (buf[i++]&0xff);
        --length;
    }

    return (~((sum & 0xFFFF)+(sum >> 16)))&0xFFFF;
}
Ernest Friedman-Hill
  • 80,601
  • 10
  • 150
  • 186
James Hughes
  • 121
  • 1
  • 2
2

I think it is type promotion that is causing trouble. Lets look at data = (((buf[i]) << 8) | ((buf[i + 1]) & 0xFF)):

  1. ((buf[i]) << 8) would promote buf[i] to int, causing sign expansion
  2. (buf[i + 1]) & 0xFF would also promote buf[i + 1] to int, causing sign expansion. But masking this argument with 0xff is the right thing - we get correct operand in this case.
  3. The whole expression gets promoted to long (once again, sign included).

The problem lies in the first argument - it should be masked with 0xff00, like that: data = (((buf[i] << 8) & 0xFF00) | ((buf[i + 1]) & 0xFF)). But I suspect there are more efficient algorithms implemented for Java, maybe even standard library has one. You might take a look at MessageDigest, maybe it has one.

Andrey Balaguta
  • 1,308
  • 2
  • 21
  • 28
  • 2
    #2 is incorrect. The literal 0xFF is treated as an int, not a byte. In step #3 you will have a short | int, where the FIRST argument gets promoted, not the second. Therefore the problem is not with 0x00,0xFF as in your example but with 0xFF,0x00 (or any buf[i] > 0x7F). When the first parameter is widened from short to int, it is sign-extended as you explained. In the original program, it is the 0xed, 0x2A case that causes the checksum to be incorrect. – robert_x44 Nov 06 '10 at 21:30
  • 2
    This post is wildly incorrect. The result of #1, #2, and #3 is an int unless either of the arguments is a long, in which case it is a long. The result of the entire expression is similarly an int and it gets widened to a long when stored in one. The (int) cast is completely unnecessary. – user207421 Nov 06 '10 at 23:04
  • @RD : Honestly, I'm confused how to fix that issue. – chotchki Nov 07 '10 at 01:24
  • 1
    @EJP, @RD, thanks for your correction, and sorry for misleading. I've corrected post accordingly. – Andrey Balaguta Nov 07 '10 at 05:29
  • It's sign *extension,* not 'sign expansion'. – user207421 Nov 07 '10 at 07:24
  • +1 for actually fixing the problem (and then revising it in light of comments) – Gary Nov 07 '10 at 08:46