2

Using xidel I'm extracting the //Assertion//Signature//KeyInfo//X509Certificate/text() from a SAMLResponse, this is a X509 certificate as a long base64 string.

I want to split this string into 64 chars blocks

I tried with tokenize() and replace() but I could make those work,

It seems that replace() does not allow me to use newlines \n in the replacement string:

echo "$SAMLRESPONSE" | base64 -D | xidel --xpath 'replace(//Assertion//Signature//KeyInfo//X509Certificate/text(),"(.{64})","$1\n")' -
**** Processing: stdin:/// ****
Error:
err:FORX0004: Invalid replacement: $1\n after $1\n
Possible backtrace:
  $000000010203F668: perhaps TXQTermTryCatch + 222920 ? but unlikely
  $0000000102068BBE: perhaps Q{http://www.w3.org/2005/xpath-functions}tokenize + 166350 ? but unlikely
  $000000010203FF78: Q{http://www.w3.org/2005/xpath-functions}replace + 376
  $0000000101FF853F: TXQTermNamedFunction + 767
  $0000000101F71CE7: perhaps ? ? but unlikely

Call xidel with --trace-stack to get an actual backtrace

And tokenize will treat the whole match as separator, and separator are not included in the output

echo "$SAMLRESPONSE" | base64 -D | xidel --xpath 'tokenize(//Assertion//Signature//KeyInfo//X509Certificate/text(),"(?:.{64})")' -
**** Processing: stdin:/// ****















XACcI5tcJbgsvr+ivGPos/WrhywkROwbEBh6OTNXTnaBiiIK

Is there any way to do split a string in fixed width chunks in XPath?

Reino
  • 3,203
  • 1
  • 13
  • 21
RubenLaguna
  • 21,435
  • 13
  • 113
  • 151
  • I thought Xidel has XPath 3 (3.0 or 3.1) support as well with e.g. the `analyze-string` function where e.g. `analyze-string($s, '.{64}')` gives you the matches with the 64 character chunks wrapped into a `match` element, see https://www.w3.org/TR/xpath-functions/#func-analyze-string for details. – Martin Honnen Feb 24 '23 at 09:13
  • Note that XPath does allow a newline to appear in a string, but it doesn't recognise `\n` as a representation of a newline - except in a regular expression (and specifically, not in a replacement string). Of course the host language in which XPath is embedded may recognize `\n` and expand it to a newline before XPath sees it, but that's very dependent on exactly where your XPath is defined. – Michael Kay Feb 24 '23 at 12:02

3 Answers3

2

Your first idea wasn't wrong, you just have to use the codepoints-to-string function for generating the newline character:

printf %s "$SAMLRESPONSE" |
base64 -D |
xidel --xpath '
    let
        $cert := //Assertion//Signature//KeyInfo//X509Certificate
    return
        "-----BEGIN CERTIFICATE-----" || codepoints-to-string(10) ||
        replace( $cert, ".{1,64}", "$0" || codepoints-to-string(10) ) ||
        "-----END CERTIFICATE-----" || codepoints-to-string(10)
' -

note: I modified the regex to .{1,64} for making sure that the "replaced" string always ends with a linefeed


ASIDE: In the first place, you don't even need to build the full output with XPath.

{
    echo '-----BEGIN CERTIFICATE-----'

    printf %s "$SAMLRESPONSE" |
    base64 -D |
    xidel --xpath '//Assertion//Signature//KeyInfo//X509Certificate' - |
    fold -w 64

    echo '-----END CERTIFICATE-----'
}
Fravadona
  • 13,917
  • 1
  • 23
  • 35
2

It seems that replace() does not allow me to use newlines \n in the replacement string:

That's because regular expressions can't be used in the replacement string. You have to use HTML entities or x:cps():

replace(...,"(.{1,64})","$1
")
replace(...,"(.{1,64})","$1
")
replace(...,"(.{1,64})","$1"||x:cps(10))

And tokenize will treat the whole match as separator

https://www.w3.org/TR/xpath-functions-31/#func-tokenize:

Returns a sequence of strings constructed by splitting the input wherever a separator is found

You want to split the input based on a separator it doesn't have. So tokenize() is unsuitable. Instead, as an alternative to replace(), you could use Xidel's own x:extract(). But above all, together with parse-xml() and x:binary-to-string() this can be done much simpler and all with Xidel:

$ echo "$SAMLRESPONSE" | xidel -se '
  "-----BEGIN CERTIFICATE-----",
  binary-to-string(base64Binary($raw)) ! extract(
    parse-xml(.)//Assertion//Signature//KeyInfo//X509Certificate,
    ".{1,64}",0,"*"
  ),
  "-----END CERTIFICATE-----"
'

And because a newline is the default value for --output-separator, there's no need for codepoints-to-string(10) either.

Reino
  • 3,203
  • 1
  • 13
  • 21
  • @RubenLaguna The stdin-dash is only needed if your binary is older than r7880, which released almost 2 years ago. So, please update. – Reino Feb 24 '23 at 13:55
  • I'm using xidel 0.9.8, and without out the `-` it errors with "err:FORG0001: Cannot cast "" to base64Binary". With `-` it works. It was released on 2018-04-22 but it seems to be the last version in `brew install xidel`. – RubenLaguna Feb 24 '23 at 14:33
  • @RubenLaguna you can download a newer binary here https://nightly.link/benibela/xidel/workflows/main/master – Fravadona Feb 24 '23 at 14:51
1

If you know some character that for sure does not appear in the original string (for example $ is not a legal character in base64 or base64url) then you can combine tokenize() and replace() to achive the expected result:

echo "$SAMLRESPONSE" | base64 -D | xidel -s --xpath 'tokenize(replace(//Assertion//Signature//KeyInfo//X509Certificate/text(),"(.{64})","$1\$"),"\$")' -| cat <(echo "-----BEGIN CERTIFICATE-----") - <(echo "-----END CERTIFICATE-----")
-----BEGIN CERTIFICATE-----
MIIC8DCCAdigAwIBAgIQGSvclGcZ8oRINlIUmlg7WzANBgkqhkiG9w0BAQsFADA0
MTIwMAYDVQQDEylNaWNyb3NvZnQgQXp1cmUgRmVkZXJhdGVkIFNTTyBDZXJ0aWZp
Y2F0ZTAeFw0yMDA2MjIwODI4NTlaFw0yMzA2MjIwODI4NTlaMDQxMjAwBgNVBAMT
KU1pY3Jvc29mdCBBenVyZSBGZWRlcmF0ZWQgU1NPIENlcnRpZmljYXRlMIIBIjAN
BgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAuJds5ZQxHlRF7j10Qey++JJ84vqm
uKjSAsSqCS/JynVs5oDO7oIZvxSdbmwUWDnuBUr8bHyqd/MUYOVCjZvt0zN6+kP0
bmB7B8IP8E2amZB4Hn7bYdrPELcCPjO01gLx6ymLn/kHVUrnYjP0/+r0pos/MeM7
vY6jbCrxLt9cR6e1loC1Z04dyHw0jBHBhqKO5iXe1AVUtmt2zKt27Hck4zndQgMo
Gb8JwekQhRzL+SHLydhVZ5QctyEoT/PkAkrflmhllAGzCYBJkxqAYOk2GTWt5Gi6
/GLm6cxp2KTH7bCJWJTOmfDbJMOEAgAlcXk2KKKPRYFc96Pd5BRyIAlcpQIDAQAB
MA0GCSqGSIb3DQEBCwUAA4IBAQCBmIXI9oVTX7BSiT+hY98UTsc64G4gkuBvwKuh
xxY9oUxrRo6VM/uuArDCjtupk5Wx5YGDWTvcNXmN+h2QQnjK/83hwjsbRP4hAitF
NcvdeQNcfeXTK7Woe1Dmdms2b2U77NnEhD23mv4/IoFnfDDunkOnoottjyQqSOIz
hrO4LIQriCPsHmm/8MYGrHX1KDN69gWYAVSQi7dPcbjhdnNQN00RKQ5XrbktWcFN
GrqVOI0Usy4i7hkcitrOmZfjet5VepXzNfWA2gxgWtWJNbhSBqGT/S+OEdZfNp6s
XACcI5tcJbgsvr+ivGPos/WrhywkROwbEBh6OTNXTnaBiiIK
-----END CERTIFICATE-----

In the above command first you apply replace() to match groups of 64 characters and replace the group with itself plus a $ at the end. Then you will use this $ as the separator for tokenize.

Again this only works if you have access to some character that you know it can't appear on the original string like $ in the base64 case.

RubenLaguna
  • 21,435
  • 13
  • 113
  • 151