Decode using ASN.1 where substrate contains some opaque data

Question

I would like to use pyasn1 to decode some data, part of which is opaque. That is, part of the data contained in the ASN.1-defined structure may or may not be ASN.1 decode-able, and I need to parse the preamble to find out how to decode it.

Based on what I understand from the pyasn1 codec documentation on "Decoding untagged types," I should be able to use the pyasn.univ.Any type to handle this case.

Here is some example code to illustrate the problem I'm having.

#!/usr/bin/env python

from pyasn1.type import univ, namedtype
from pyasn1.codec.der import decoder, encoder

class Example(univ.Sequence):
    componentType = namedtype.NamedTypes(
        namedtype.NamedType('spam', univ.Integer()),
        namedtype.NamedType('eggs', univ.Any())
    )

example = Example()
example['spam'] = 42
example['eggs'] = univ.Any(b'\x01\x00abcde') # Some opaque data
substrate = encoder.encode(example)

"""
    >>> import binascii
    >>> print(binascii.hexlify(substrate).decode('ascii')))
    300a02012a01006162636465

      ^^      ^
      ||      + Opaque data begins here
      ++ Note: the length field accounts for all remaining substrate
"""

data, tail = decoder.decode(substrate, asn1Spec=Example())
print(data)

The encoded example is consistent with my expectations. However, this program fails inside the decoder with the following traceback.

Traceback (most recent call last):
  File "./any.py", line 27, in <module>
    data, tail = decoder.decode(substrate, asn1Spec=Example())
  File "/Users/neirbowj/Library/Python/3.4/lib/python/site-packages   /pyasn1-0.1.8-py3.4.egg/pyasn1/codec/ber/decoder.py", line 825, in __call__
  File "/Users/neirbowj/Library/Python/3.4/lib/python/site-packages/pyasn1-0.1.8-py3.4.egg/pyasn1/codec/ber/decoder.py", line 342, in valueDecoder
  File "/Users/neirbowj/Library/Python/3.4/lib/python/site-packages/pyasn1-0.1.8-py3.4.egg/pyasn1/codec/ber/decoder.py", line 706, in __call__
pyasn1.error.SubstrateUnderrunError: 95-octet short

I believe what's happening is that the decoder is trying to work on the portion of the data I've tried to identify as univ.Any and failing---because it's not a valid encoding---rather than returning it to me as some binary data encapsulated in a univ.Any object as I expect.

How can I parse data of this form using pyasn1?

Incidentally, the actual data I am trying to decode is a SASL token using the GSSAPI mechanism, as defined in section 4.1 of RFC 4121: KRB5 GSSAPI mechanism v2, which I excerpt here for convenience.

     GSS-API DEFINITIONS ::=

     BEGIN

     MechType ::= OBJECT IDENTIFIER
     -- representing Kerberos V5 mechanism

     GSSAPI-Token ::=
     -- option indication (delegation, etc.) indicated within
     -- mechanism-specific token
     [APPLICATION 0] IMPLICIT SEQUENCE {
             thisMech MechType,
             innerToken ANY DEFINED BY thisMech
                -- contents mechanism-specific
                -- ASN.1 structure not required
             }

     END

The innerToken field starts with a two-octet token-identifier
(TOK_ID) expressed in big-endian order, followed by a Kerberos
message.

Following are the TOK_ID values used in the context establishment
tokens:

      Token               TOK_ID Value in Hex
     -----------------------------------------
      KRB_AP_REQ            01 00
      KRB_AP_REP            02 00
      KRB_ERROR             03 00

EDIT 1: Attach sample data

Here is a sample GSSAPI-Token (lightly sanitized) that was serialized, I believe, by cyrus-sasl and heimdal.

YIIChwYJKoZIhvcSAQICAQBuggJ2MIICcqADAgEFoQMCAQ6iBwMFACAAAACjggFm
YYIBYjCCAV6gAwIBBaELGwlBU04uMVRFU1SiNjA0oAMCAQGhLTArGwtzZXJ2aWNl
bmFtZRscc2VydmljZWhvc3QudGVzdC5leGFtcGxlLmNvbaOCARAwggEMoAMCARCh
AwIBBKKB/wSB/A81akUNsyvRCCKtERWg9suf96J3prMUQkabsYGpzijfEeCNe0ja
Eq6c87deBG+LeJqFIyu65cCMF/oXtyZNB9sUxpqFBcfkAYZXTxabNLpZAUmkdt6w
dYlV8JK/G3muuG/ziM14oCbh8hIY63oi7P/Pdyrs3s8B+wkNCpjVtREHABuF6Wjx
GYem65mPqCP9ZMSyD3Bc+dLemxhm7Kap8ExoVYFRwuFqvDf/E5MLCk2HThw46UCF
DqFnU46FJBNGAK+RN2EptsqtY48gb16klqJxU7bwHeYoCsdXyB6GElIDe1qrPU15
9mGxpdmSElcVxB/3Yzei48HzlkUcfqSB8jCB76ADAgEQooHnBIHkZUyd0fJO3Bau
msqz6ndF+kBxmrGS6Y7L20dSYDI2cB8HsJdGDnEODsAAcYQ0L5c2N/mb8QHh7iU9
gtjWHpfq/FqMF4/aox/BJ0Xzuy2gS4sCafs7PTYtSDh2nyLkNYuxKdmQ1ughbIq6
APAegqa7R1iv2oCaNijrpKc2YUfznnwT/CTSsGrJpMwz4KLuBtjI4f74bQty8uNn
LVxxV4J8wU1s7lSj4Ipbi+a1WdCVsLs8lIqFmKXte+1c+qHeadoAGmSTBT3qFZae
SRdT8dpYr6i6fkjRsoyEZs9ZqQtwQAYSdMBU

Ilya Etingof · Accepted Answer · 2015-07-29T09:38:17.923

My impression is that ANY type serialization can only contain valid BER/DER serialization. Think of ANY type as a CHOICE type with infinite number of alternatives (see chapter on ANY type here).

My first instinct is to put innerToken into OCTET STRING like this:

class Example(univ.Sequence):
    componentType = namedtype.NamedTypes(
        namedtype.NamedType('spam', univ.Integer()),
        namedtype.NamedType('eggs', univ.OctetString())
    )

what would give you ready-made values upon decoding:

>>> example = Example()
>>> example['spam'] = 42
>>> example['eggs'] = b'\x01\x00abcde'
>>> print(example.prettyPrint())
Example:
 spam=42
 eggs=0x01006162636465
>>> substrate = encoder.encode(example)
>>> data, tail = decoder.decode(substrate, asn1Spec=Example())
>>> print(data.prettyPrint())
Example:
 spam=42
 eggs=0x01006162636465

On the other hand, if you would literally use the values from the spec:

KRB_AP_REQ            01 00
KRB_AP_REP            02 00
KRB_ERROR             03 00

they would look like valid DER serialization that could be decoded with your original Example spec:

>>> KRB_AP_REQ = '\x01\x00'
>>> KRB_AP_REP = '\x02\x00'
>>> KRB_ERROR = '\x03\x00'
>>> class Example(univ.Sequence):
...     componentType = namedtype.NamedTypes(
...         namedtype.NamedType('spam', univ.Integer()),
...         namedtype.NamedType('eggs', univ.Any()),
...         namedtype.NamedType('ham', univ.Any()),
... )
... 
>>> example = Example()
>>> example['spam'] = 42
>>> example['eggs'] = KRB_AP_REQ
# obtain DER serialization for ANY type that follows
>>> example['ham'] = encoder.encode(univ.Integer(24))
>>> print(example.prettyPrint())
Example:
 spam=42
 eggs=0x0100
 ham=0x020118
>>> substrate = encoder.encode(example)
>>> data, tail = decoder.decode(substrate, asn1Spec=Example())
>>> print(data.prettyPrint())
Example:
 spam=42
 eggs=0x0100
 ham=0x020118
>>> data['eggs'].asOctets()
'\x01\x00'
>>> data['eggs'].asNumbers()
(1, 0)
>>> example['eggs'] == KRB_AP_REQ
True

But that is a sort of cheating and may not work for arbitrary innerToken values.

So how does GSSAPI-Token serialization produced by other tools looks like?

Using `OctetString` seems like a great way to handle this, but unfortunately it would require the RFC to be updated because it adds its own tag and length octets to the serialization. In your example, "300c02012a040701006162636465", the "0407". I'll see what I can do to provide a concrete example of a serialized GSSAPI-Token. — neirbowj, Jul 27 '15 at 10:20
Well, then it all depends if all possible KRB_* values could formally be seen as valid (though imaginary) DER serialization. For the values you mentioned, decoder will work alright. — Ilya Etingof, Jul 27 '15 at 10:57
I think I know how it would look, but if you could provide a little sample code to show the decoder accepting the constrained TOK_ID followed by some other unpredictable, but constrained object, I'll accept your answer. — neirbowj, Jul 28 '15 at 20:22
I'm not sure what do you mean saying "constrained" here, but I hope I added code you need. If not - please, clarify. — Ilya Etingof, Jul 29 '15 at 09:39
BTW, I'm not sure ASN.1 supports a concept of sending fully unpredictable components to peers. Multiple choices - yes, but untyped blobs - not sure. But blob encapsulation into some definite type is a way to go. — Ilya Etingof, Jul 29 '15 at 09:46

Decode using ASN.1 where substrate contains some opaque data

1 Answers1