0

I have an xml with below content and my question is how to extract Username, Password values from resource tag, here we need to exclude commented resource tag and fetch values from uncommented resource tag by using shell script. I tried but it was fetching values from latest tag. Can someone help me how to remove comments tags and fetch values from xml.

<?xml version='1.0' encoding='utf-8'?>
<!-- The contents of this file will be loaded for each web application -->
<!--
 <Resource name="jdbcSource" auth="Container"
type="javax.sql.DataSource"
 username="demo"
    password="test"
        driverClassName="driverclassname"
        url="driver@host"
    maxActive="20"
    maxIdle="10"
     />

-->

<Resource auth="Container"
driverClassName="driverclassname" maxActive="100" maxIdle="30" maxWait="10000"
name="jdbcSource" password="test" type="javax.sql.DataSource"
url="driver@host"
username="demo"/>

</Context>
RobC
  • 22,977
  • 20
  • 73
  • 80
Mahesh
  • 61
  • 1
  • 8

4 Answers4

2

Firstly my answer assumes that you have actual well formed source XML. The example code you've provided isn't XML as it doesn't have an opening root element, namely <Context> - but I'll assume there is one anyway.


Bash features by themselves are not very well suited parsing XML.

This Bash FAQ states the following:

Do not attempt [to extract data from an XML file] with , , , and so on (it leads to undesired results)

If you must use a shell script then utilize an XML specific command line tool, such as XMLStarlet (there are other similar tools available). See download info here - if you don't already have XML Starlet installed.

Solution:

Using XML Starlet you can run the following commands:

uname=$(xml sel -t -v "/Context/Resource/@username" path/to/file.xml)
pword=$(xml sel -t -v "/Context/Resource/@password" path/to/file.xml)

echo "$uname $pword" # --> demo test

Explanation

  • uname=$(...)

    Here we utilize Command substitution to assign the output of the XML Startlet command to a variable named uname (i.e. the username).

  • xml sel -t -v "/Context/Resource/@username"

    This command breaks down as follows:

    • xml - invoke the XML Starlet command.
    • sel - select data or query XML document(s).
    • -t - the template option.
    • -v - print the value of XPATH expression.
    • "/Context/Resource/@username" - the expression to select the value of the username attribute of the Resource tag/element.
  • path/to/file.xml

    This part should be replaced with the real path to your .xml file.

Likewise, we utilize a similar command for obtaining the value of the password attribute, whereby we assign the output of the command to a variable named pword, and change the XPATH expression.


Edit 1: A more efficient command

As per Charles Duffy's first comment below... you can also extract both attribute values more efficiently using the following command instead:

{ IFS= read -r uname && IFS= read -r pword; } < <(xml sel -t -v "/Context/Resource/@username" -n -v "/Context/Resource/@password" path/to/file.xml)

echo "$uname $pword" # --> demo test

The main benefit here is that the source XML file is only read once.


Edit 2: Using XML Starlet to generate an XSLT template that can then be run on any system with xsltproc, including hosts that don't have XML Starlet installed:

As per Charles Duffy's second comment below...

It's also possible to utilize XML Starlet to generate an template which is derived from the XML Starlet query shown previously. The .xsl file which is generated can then be run on any system which has available (including hosts that don't have XML Starlet installed).

The following steps demonstrate how to achieve this:

  1. Firstly run the following XML Starlet command to generate the .xsl file:

    xml sel -C -t -v "/Context/Resource/@username" -n -v "/Context/Resource/@password" path/to/file.xml > path/to/resultant/my-template.xsl
    

    This command is very similar to the previously shown XML Starlet command. The notable differences are:

    • The additional -C option between sel and -t
    • The redirection operator > and a file path. This specifies the location at which to save the output, (i.e. the generated XSLT template/stylesheet).

      Note the path/to/resultant/my-template.xsl part should be changed as necessary.

    The contents of the generated XSLT stylesheet will be something like the following:

    my-template.xsl

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" version="1.0" extension-element-prefixes="exslt">
      <xsl:output omit-xml-declaration="yes" indent="no"/>
      <xsl:template match="/">
        <xsl:call-template name="value-of-template">
          <xsl:with-param name="select" select="/Context/Resource/@username"/>
        </xsl:call-template>
        <xsl:value-of select="'&#10;'"/>
        <xsl:call-template name="value-of-template">
          <xsl:with-param name="select" select="/Context/Resource/@password"/>
        </xsl:call-template>
      </xsl:template>
      <xsl:template name="value-of-template">
        <xsl:param name="select"/>
        <xsl:value-of select="$select"/>
        <xsl:for-each select="exslt:node-set($select)[position()&gt;1]">
          <xsl:value-of select="'&#10;'"/>
          <xsl:value-of select="."/>
        </xsl:for-each>
      </xsl:template>
    </xsl:stylesheet>
    
  2. Next, run the following command which utilizes to transform the source .xml file. This ultimately assigns the result of the transformation to the two variables, i.e. uname and pword:

    { IFS= read -r uname && IFS= read -r pword; } < <(xsltproc path/to/resultant/my-template.xsl path/to/file.xml)
    
    echo "$uname $pword" # --> demo test
    

    Note the parts reading path/to/resultant/my-template.xsl and path/to/file.xml should be changed as necessary.


RobC
  • 22,977
  • 20
  • 73
  • 80
  • 1
    You could extract both in just one run. `{ IFS= read -r uname && IFS= read -r pword; } < <(xmlstarlet ... -v foo -n -v bar -n)` -- more efficient that way. – Charles Duffy Jan 04 '19 at 13:03
  • 1
    It might also be valuable to show how to tell XMLStarlet to generate an XSLT template that can then be run on any system with `xsltproc`, including hosts that don't have XMLStarlet installed. – Charles Duffy Jan 04 '19 at 13:05
  • @CharlesDuffy - Done... edits to my answer now demonstrate both suggestions mentioned in your comments - Thank you ! – RobC Jan 04 '19 at 15:35
  • 1
    If I could give you a second +1 I would. :) – Charles Duffy Jan 04 '19 at 16:01
1

RobC already explained why you shouldn't use native Bash tools to parse html/xml. I'd recommend a dedicated tool like .

I've added an opening <Context>, as shown by m.nguyencntt, and saved your xml-file as so_54034541.xml.

With command substitution you could of course set the variables by calling xidel twice...

uname=$(xidel -s so_54034541.xml -e '//Resource/@username')
pword=$(xidel -s so_54034541.xml -e '//Resource/@password')

...but xidel also has its own way to export (multiple) variables:

xidel -s so_54034541.xml -e '//Resource/(uname:=@username,pword:=@password)'
uname := demo   # Internal variables for use within the extraction query itself.
pword := test

xidel -s so_54034541.xml -e '//Resource/(uname:=@username,pword:=@password)' --output-format=bash
uname='demo'   # At the moment these are just strings.
pword='test'   # Use Bash's eval built-in command to actually set/export these variables.

eval "$(xidel -s so_54034541.xml -e '//Resource/(uname:=@username,pword:=@password)' --output-format=bash)"

echo "$uname $pword"
demo test
Reino
  • 3,203
  • 1
  • 13
  • 21
0

with perl one liner

perl -n0777E '
    # remove comments
    s/<!--.*?-->//gs;

    # match username and password with lookaheads and display in custom way
    say "user:$1\tpass:$2" while /<Resource(?=[^>]*\susername="([^"]*)")(?=[^>]*\spassword="([^"]*)")[^>]*>/g
' < file.xml
Nahuel Fouilleul
  • 18,726
  • 2
  • 31
  • 36
  • Thanks, but i am looking only shell script not perl. Anyway i tried your code but it didn't work in my environment. – Mahesh Jan 04 '19 at 08:53
  • 1
    @Mahesh Define "shell script". Do you mean you are unwilling to use `sed`, `awk`, `grep`, and any other command that is not a shell builtin? That is overly restrictive, and IMO completely defeats the point of the shell. Using `perl` is perfectly valid in a shell script. – William Pursell Jan 04 '19 at 13:15
  • 1
    ...to showcase some other concrete corner cases -- `< Resource` is valid XML, but won't be found by the code here. Moving the `username` onto a different line from the `Resource` is also valid, and I'm not sure that's honored here. And there are entities -- consider if someone has a password with a literal quote; it would become `"` -- this and other entities would need to be decoded to parse the value robustly. – Charles Duffy Jan 04 '19 at 16:13
  • @CharlesDuffy, this is why it was using the lookaheads the order doesn't matter, and there is no pb with newlines because 0777 option and `[^>]` also matches newlines, the only issue to handle may be false positives in cdata which can be removed like comment: `s/<\!\[CDATA\[.*?]]>//gs` – Nahuel Fouilleul Jan 05 '19 at 09:14
  • 1
    Still got entity decoding in output as work that needs to be happen but which isn't currently implemented. (And to actually implement the full letter-of-the-standard, something would need to support entities added in the individual document's DTD). – Charles Duffy Jan 05 '19 at 14:14
0

i did as below:

Created yourxmlfile.xml

<Context>
    <Resource auth="Container"
    driverClassName="driverclassname" maxActive="100" maxIdle="30" maxWait="10000"
    name="jdbcSource" password="test" type="javax.sql.DataSource"
    url="driver@host"
    username="demo"/>
</Context>

sed -n 's/.[^ ]* password="([^"])./\1/p' yourxmlfile.xml

  test
m.nguyencntt
  • 935
  • 13
  • 19