3

I'm currently starting to work with DER (Distinguished Encoding Rules) encoding and have problems understanding the encoding of integers.

In the reference document https://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf this encoding is defined as follows:

8.3.1 The encoding of an integer value shall be primitive. The contents octets shall consist of one or more octets.

8.3.2 If the contents octets of an integer value encoding consist of more than one octet, then the bits of the first octet and bit 8 of the second octet:

  1. shall not all be ones; and

  2. shall not all be zero.

NOTE – These rules ensure that an integer value is always encoded in the smallest possible number of octets.

8.3.3 The contents octets shall be a two's complement binary number equal to the integer value, and consisting of bits 8 to 1 of the first octet, followed by bits 8 to 1 of the second octet, followed by bits 8 to 1 of each octet in turn up to and including the last octet of the contents octets.

On another site, https://learn.microsoft.com/en-us/windows/desktop/seccertenroll/about-integer, it is explained that for positive numbers whose binary representation starts with a 1, a zero byte is added at the front. This is also mentioned in the answers to a former question on stackoverflow: ASN Basic Encoding Rule of an integer.

Unfortunately, from these answers I cannot see how this latter instruction can be deduced from the rules of the reference document.

For example, if I want to encode the number 128, why can't I do this as

[tag byte] [length byte] 10000000?

I know that the correct encoding would be [tag byte] [length byte] 00000000 10000000, but which condition is injured by the variant above? Probably it has something do to with the two's complement, but isn't the two's complement of 128 again 10000000?

I hope you can help me understand why the description on the Microsoft site is equivalent to the original definition. Thank you.

Bilbo
  • 63
  • 1
  • 7
  • Maybe [this](https://www.strozhevsky.com/free_docs/asn1_by_simple_words.pdf) will help – pepo Mar 26 '19 at 14:49
  • Unfortunately it does not so far. I had already considered this document, but it also does not seem to mention that for negative integers there is an additional zero byte added. Instead, the document says "Encoding of negative integers has its own rules" and then describes the Two's complement in its own words. But in the reference document for ASN.1 the word "negative" only appears three times, and none of these refers to the encoding of integers. So I still wonder where these "own rules" come from. – Bilbo Mar 26 '19 at 16:13
  • @Bilbo It "is" there, but no straight thinking human being would express it like this. The sign-constraint is that the most significant bit meets the sign bit. E.g. for positive 0x80 this is violated and you extend to 0x0080 to fulfill the sign constraint. – Sam Ginrich Jan 02 '22 at 22:51

2 Answers2

1

The Two's complement rule (8.3.3) says that if the high bit of the first (lowest index) content byte is set that the number is negative.

02 01 80 has contents 0b1000_0000. Since the high bit is set, the number is negative.

Flip all the bits (0b0111_1111), then add one: 0b1000_0000; meaning that it represents negative 128.

For a less degenerate example, 0b1000_0001 => 0b0111_1110 => 0b0111_1111, showing that 0x81 is negative 127.

For the number (positive) 127, since the high bit isn't set, the number is interepreted as positive, so the contents are just 0b0111_1111 aka 0x7F, resulting in 02 01 7F

bartonjs
  • 30,352
  • 2
  • 71
  • 111
  • Sorry, but I don't see how this answers my question. Maybe I failed to make my point clear. I basically understand how the representation of numbers in Two's complement works. My question refers to the sentence from the Microsoft website "If the integer is positive but the high order bit is set to 1, a leading 0x00 is added to the content to indicate that the number is not negative." I still don't see where this comes from? – Bilbo Mar 26 '19 at 16:03
  • The extra 0 byte means the high bit of the first byte is not set, so the value is the positive 15 but number 0b000_0000_1000_0000 (128). Without it the high bit is set, so the number is negative: the two’s complement of the 7 bit value 0b000_0000, which is (in 8-bit land) `-128`. – bartonjs Mar 26 '19 at 16:21
  • 1
    Ok, I think now I see where I was wrong. I thought as 1000 0000 being the binary representation of 128; but this is NOT the representation as a Two's complement number. If we consider signed bytes, we NEED a 0 at the start, so we would like to put a zero bit at the start. And since we must use full bytes in our encoding, we then must put a complete zero byte at the start. – Bilbo Mar 26 '19 at 18:19
  • @Bilbo Yep, that’s pretty much it. – bartonjs Mar 26 '19 at 18:20
  • Just to add another piece that contributed to my confusion: The Microsoft article says "The Value field of the TLV triplet contains the encoded integer if it is positive, or its two's complement if it is negative." Here the OPERATION of taking the two's complement (by switching bits and adding 1) must be meant; apparently in contrast, the reference document says that always(!) "the contents octets shall be a two's complement binary number equal to the integer value". Here not the operation, but the REPRESENTATION in two's complement is meant, which switches the bits only for negative numbers. – Bilbo Mar 26 '19 at 21:53
  • @Bilbo Two’s complement representation means “if the most significant bit is set, this is a two’s complement negative number” and “if it is not set, this is a positive integer”. It’s the “Potential ambiguities of terminology” section on Wikipedia. – bartonjs Mar 26 '19 at 22:49
1

Common pattern in ASN.1 is TLV, Type / Length / Value

Type: One Octet, 0x02 for Integers

Value: Two's complement with sign constraint is answered above.

Length coding has two modi:

  1. Most significant bit of first length octet is not set: Then the octet is the content length itself.

  2. Most significant bit of first length octet is set: Then the first octet is followed by (value-128) octets forming the actual length as non-negative integer, byte order big endian.

Lengthes 0...127 go with first rule, 128... with second one.

Sam Ginrich
  • 661
  • 6
  • 7