1

Question: How can I convert a BigInteger in Java to match the Botan BigInt encoding?

I have communication between Java and a C++ application using Botan. Botan has a BigInt class, comparable to BigInteger. However, the encodings differ when converting to a byte array.

In Botan, the BigInt is encoded as follows:

void BigInt::binary_encode(uint8_t output[]) const
{
   //bytes just returns the # of bytes, in my case its 32 always
   const size_t sig_bytes = bytes();
   for(size_t i = 0; i != sig_bytes; ++i)
      output[sig_bytes-i-1] = byte_at(i);
}

In Java, its encoded:

 public byte[] toByteArray() {
        int byteLen = bitLength()/8 + 1;
        byte[] byteArray = new byte[byteLen];

        for (int i=byteLen-1, bytesCopied=4, nextInt=0, intIndex=0; i >= 0; i--) {
            if (bytesCopied == 4) {
                nextInt = getInt(intIndex++);
                bytesCopied = 1;
            } else {
                nextInt >>>= 8;
                bytesCopied++;
            }
            byteArray[i] = (byte)nextInt;
        }
        return byteArray;
    }
jww
  • 97,681
  • 90
  • 411
  • 885
user1432882
  • 1,126
  • 4
  • 14
  • 29
  • 1
    Bad title. Can't think of a better one at the moment, but it's going to attract trolling. Best thing I can think of is serialize to a neutral encoding and then read it back into the specific encoding. For dumb-stupid proof of concept, I'd use a string of ASCII digits. – user4581301 Feb 08 '18 at 00:37
  • Possible dupe https://stackoverflow.com/questions/362384/does-java-read-integers-in-little-endian-or-big-endian – Scary Wombat Feb 08 '18 at 01:00
  • 1
    The Javadocs very carefully and precisely describe the output of `BigInteger.toByteArray()`. What about Botan? – President James K. Polk Feb 08 '18 at 03:28
  • I assume that the difference is in endianness, but I don't know for sure. I don't know Botan. But this can easily be tested: create a small BigInteger (say, 10 digits) and compare the output. If the output of the Java BigInteger is (almost) the reverse of the Botan output, then this is an endianness issue. The first or last bytes may differ. How they differ also tells more about the difference in output (Encoding is the wrong word for this, IMO). Could you show us the bytes output from such a small (9-10 digit) value? If possible, a negative value. – Rudy Velthuis Feb 08 '18 at 14:19
  • How are you trying to compare the results? Remember Java only has signed bytes, so it may be that the arrays created are actually the same, only Java displays any with the high bit set as a negative while C++ displays them as a value > 127. – matt Feb 08 '18 at 14:54
  • 1
    It would be more useful to have a sample output from the same input on each system so we can give you a meaningful answer. I know I for one don't have access to test with Botan. As @matt suggested, things might be OK just thrown off because Java has no unsigned data type. – avgvstvs Feb 08 '18 at 16:32
  • @avgvstvs: Indeed. That is what I was (implicitly) asking for too. Not too big a sample, just the output for a number of around 10 digits or so. That way we could see endianness, signed/unsiged issues, extra byte to get around the sign bit (e.g. adding 0 in front when top byte was > 127), etc., but also sign-magnitude or two's complement, etc. – Rudy Velthuis Feb 08 '18 at 17:03
  • 1
    Are the integers you're trying to serialize negative? Java uses a 2s-complement representation when encoding, while Botan's effectively ignores the sign when encoding, encoding the absolute value. For non-negative integers, Botan should decode the output of Java's toByteArray without problems. Another possible problem though, is Java also assumes input to the byte[] constructor is 2s complement. So if you encode an integer with the high bit set in Botan, it will output an array with a leading 1 bit set. Java will interpret this as a negative integer. Prefix with 0x00 byte to work around this. – Jack Lloyd Feb 09 '18 at 20:04

1 Answers1

0

You can

  • Shift the big integer right by the size of long (in bits), and convert to long (which drops all high order bytes). This gets you the lowest order limb (like digit, except a long rather than bit).
  • Repeat the above until the value of the shifted big integer (not just the low order bytes) is 0. You'll want to
  • Store the longs from the loop in an array. This array of longs can be passed through the JNI or whatever cross-language interface you use.
  • In the other language, do the operation in reverse: Initialize an empty big integer as the accumulator, loop over array from high order to low, shift the current accumulation to left by the size of long, and then bitwise-or the current long value to the low order position. The shifts of later iterations will bring the limbs into their final position.

These algorithms should work for any big integer library (assuming they offer essential operations like shifting and conversion to long) and I suppose in any language. This should be faster than converting to textual representation, but that approach is superior in regard to simpler implementation.

eerorika
  • 232,697
  • 12
  • 197
  • 326