1

I have an XML file with multiple lines like below.

<sandbox>false</sandbox>
<serverUrl>https://salesforce.com/services/Soap/u/37.0/</serverUrl>
<sessionId>00D4100000087K9!AQMAQJElzjgvA01eaCo</sessionId>
<userId>00541000000JOzJAAW</userId>
<userInfo>

I am trying to use sed on Linux to get a value between the two sessionId tags.

sed -n '/<sessionId>.*$/{s/<sessionId>.*<\/sessionId>/\1/;p}' LoginResponse.xml

But it is throwing the below error. Any suggestions please...

sed: -e expression #1, char 50: invalid reference \1 on `s' command's RHS
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
Kris
  • 109
  • 1
  • 9
  • I used round brackets but still getting the same error. sed -n '/.*$/{s/(.*)<\/sessionId>/\1/;p}' LoginResponse.xml – Kris Sep 23 '16 at 18:49

1 Answers1

4

The Right Thing

Don't use sed for this at all; XML is not a regular language, so regular expressions are categorically not powerful enough to parse it correctly. Your current code can't distinguish a comment that talks about sessionId tags from a real sessionId tag; can't recognize element encodings; can't deal with unexpected attributes being present on your tag; etc.

Instead, use:

xmlstarlet sel -t -m '//sessionId' -v . -n < LoginResponse.xml

...or, if you don't have XMLStarlet, you can use XSLTProc (which is almost universally available out-of-the-box on modern UNIXy systems). If you save the following as extract-session-id.xslt:

<?xml version="1.0"?>
<!-- this was generated with:
  -- xmlstarlet sel -C -t -m '//sessionId' -v . -n
  -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" version="1.0" extension-element-prefixes="exslt">
  <xsl:output omit-xml-declaration="yes" indent="no"/>
  <xsl:template match="/">
    <xsl:for-each select="//sessionId">
      <xsl:call-template name="value-of-template">
        <xsl:with-param name="select" select="."/>
      </xsl:call-template>
      <xsl:value-of select="'&#10;'"/>
    </xsl:for-each>
  </xsl:template>
  <xsl:template name="value-of-template">
    <xsl:param name="select"/>
    <xsl:value-of select="$select"/>
    <xsl:for-each select="exslt:node-set($select)[position()&gt;1]">
      <xsl:value-of select="'&#10;'"/>
      <xsl:value-of select="."/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

...then you can run xsltproc extract-session-id.xslt LoginResponse.xml to get your output.


The sed Thing

That said, with respect to your sed bug: You need to pass -r to enable ERE syntax:

# requires GNU sed for -r
sed -r -n -e '/<sessionId>.*$/{s/<sessionId>(.*)<\/sessionId>/\1/;p}'

Alternately, with the MacOS BSD sed, some other tweaks are needed:

# -E, not -r, on MacOS BSD sed; semicolon between "p", "}" needed.
sed -E -n '/<sessionId>.*$/ { s/<sessionId>(.*)<\/sessionId>/\1/; p; }'

This will behave badly if your session IDs ever include characters that are behind elements -- &s will look like &amp; and so forth; using a proper XML parser is thus the safer option. (Likewise, if the content ever changed so <sessionid type="foo">...</sessionid>, or in the event of any manner of other changes).

Community
  • 1
  • 1
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Thanks a lot @Charles Duffy for your suggestion. Using -r option with Sed fixed the issue. It is also good to know about the xmlstarlet option. Both worked great. – Kris Sep 23 '16 at 19:06
  • Yes, I totally agree with you for using the proper XML parser like xmlstarlet instead of sed. Thanks again for your detailed answer, I truly appreciate it. – Kris Sep 23 '16 at 19:16