Interpreting ASN.1 indefinite-lenght encoding with multiple encapsulated octet-strings

Question

I have a BER structure like this...

$ openssl asn1parse -inform der -in test.der -i -dump

 ????:d=4  hl=2 l=inf  cons:     cont [ 0 ]
 ????:d=5  hl=3 l= 240 prim:      OCTET STRING
      0000 - AABBCCDD
 ????:d=5  hl=2 l=   8 prim:      OCTET STRING
      0000 - EEFF
 ????:d=5  hl=2 l=   0 prim:      EOC

...or in der2ascii style...

[0] `80`
  OCTET_STRING { `AABBCCDD` }
  OCTET_STRING { `EEFF` }
`0000`

What I know: indefinite-length encoding must contain a constructed type, because primitive types may introduce ambiguities, e.g. when containing 0x0000. What I want to know: How does a decoder must behave when parsing this BER structure? Are the header bytes of both OCTET STRINGs included in the encoding? If yes, how is indefinite-length byte data encoded? How does an application interpret the value of the TLV field tagged [0], when the second OCTET STRING is e.g. an INTEGER?

I am asking this question, because in the CMS standard, a field is defined as single OCTET STRING, but in most BER encodings I always see two of them. Is this only due to the indefinite-length encoding? Am I missing something?

From ITU-T X.690:

8.1.4 Contents octets

The contents octets shall consist of zero, one or more octets, and shall encode the data value as specified in subsequent clauses.

NOTE – The contents octets depend on the type of the data value; subsequent clauses follow the same sequence as the definition of types in ASN.1.

Does this mean, that I can put every constructed type and the application must only interpret the value part of the contructed TLV structure?

Ilya Etingof · Accepted Answer · 2017-08-03T12:33:54.737

When you encode a primitive OCTET STRING in indefinite length mode, the encoder must:

split up the value into chunks of smaller OCTET STRINGs
encode each chunk in definite length mode so that each has its own TLV (with length!)
the whole sequence of definite length encoded primitive OCTET STRINGs must be framed by a single, indefinite length encoded constructed OCTET STRING "container" having its own TLV (without length, but with end-of-octets sentinel)

At the other end, the decoder extracts the V part from the inner, definite length OCTET STRING chunks (dropping their TL headers). Then joins/consumes V's together in the order of arrival dropping the TL part of the outer frame.

Note that the idea behind indefinite length encoding technique is that both encoder and decoder can emit/consume incomplete, possibly oversized, data.

Chunk size is chosen by the encoder/application based on data availability, memory situation and possibly the estimation of decoder's buffering capabilities. I think this is mentioned somewhere in the X.280/X.680 papers.

Encoder is not allowed to put chunks of different ASN.1 types into any single indefinite length encoded container. In other words, all chunks must be of the same type as the outer container.

That should hopefully explain why you may see multiple (depending on chunk size) OCTET STRINGs in the indefinite length encoded BER/CER stream where just a single OCTET STRING is expected.

DER forbids indefinite length encoding on the grounds that serialized representation of the same data may change on re-encoding (due to potentially changing chunk size).

So, when the chunks are processes, the headers of those must be discarded, right? Is this also defined if the chunks have different types, like e.g. INTEGER? Is the process of generating chunks defined anywhere? PS: I will rephrase my Question to use BER. The standard I am looking at is basically DER, but sometimes allows BER constructs... — duesee, Aug 03 '17 at 11:20

Interpreting ASN.1 indefinite-lenght encoding with multiple encapsulated octet-strings

1 Answers1