The obligatory reminder: assuming your input is well-formed XML, using an XML parser will provide a more robust solution (see bottom).
Here's a single-utility awk
solution:
awk -v RS= -F '<connection name="|<hostPort>' '
{
sub(/".*/, "", $2)
split($3, tokens, /[:<]/)
printf "%-6s %s %s\n", $2, tokens[1], tokens[2]
}
' file
-v RS=
tells awk
to split the input into records by paragraphs, where a paragraph is a run of non-empty lines.
-F '<connection name="|<hostPort>'
splits each paragraph into fields by occurrences of <connection name="
or (|
) <hostPort>
, so that the data of interest will be at the start of 2nd and 3rd fields ($2
and $3
).
sub(/".*/, "", $2)
removes everything following the first "
from field 2, effectively leaving just the connection name.
split($3, tokens, /[:<]/)
splits the 3rd field into an array of tokens by occurrences of :
and <
, yielding the host name in the 1st array element and the port in the 2nd.
printf "%-6s %s %s\n", $2, tokens[1], tokens[2]
prints an output line, right-padding the connection name to at least 6 characters with spaces, as in your sample output; simply omit the -6
if you just want a single space to separate the output fields.
Optional reading: XML-parsing utilities (CLIs) usable in shell scripts
Below are solutions that contrast the 3 utilities listed above.
The following well-formed XML document is assumed to be contained in file
- note how the <connection>
elements are now enclosed in a single, top-level <doc>
element:
<doc>
<connection name="test1" transport="tcp">
<LPort>host1:11111</LPort>
<hostPort>host1:11111</hostPort>
<abcd>1234</abcd>
</connection>
<connection name="test2" transport="tcp">
<hostPort>host2:22222</hostPort>
<GPort>host1:12111</GPort>
</connection>
<connection name="xyz1" transport="tcp">
<hostPort>host3:33333</hostPort>
<FPort>host1:12113</FPort>
<efgi>5678</efgi>
</connection>
<connection name="xyz2" transport="tcp">
<LPort>host1:12234</LPort>
<hostPort>host4:4444</hostPort>
</connection>
</doc>
xmllint
solution:
xmllint
's lack of control over formatting of the query results requires a nontrivial awk
helper command:
echo 'cat //connection/@name | //hostPort/text()' | xmllint --shell file | awk -F\" '
NR % 2 { next } # skip separator lines
NR % 4 == 2 { conn = $2; next } # save connnection name
{
split($0, tokens, ":")
printf "%-6s %s %s\n", conn, tokens[1], tokens[2]
}
'
xmlstarlet
solution:
xmlstarlet
's sel
sub-command supports very flexible extractions by translating options into XLST templates behind the scenes:
xmlstarlet sel -t -m '//connection' -v 'str:align(@name, " ")' \
-o ' ' \
-c 'str:replace(hostPort, ":", " ")' -n file
xidel
solution:
xidel
is very flexible and not only supports XML, but also HTML and JSON.
While it has no support for XLST, it supports XQuery, a superset of XPath with XSLT-like features, which enables powerful transformations.Tip of the hat to Reino.
As far as I can tell, there is no function for padding, however, so a - straightforward - auxiliary awk
command is used:
xidel file -q --xquery \
'for $c in //connection return concat($c/@name, " ", replace($c/hostPort, ":", " "))' |
awk '{ printf "%-6s %s %s\n", $1, $2, $3 }'
That said, XQuery even supports user-defined functions, so you can write your own padding function:
xidel -q file --xquery '
declare function pad($s as xs:string?) as xs:string
{
substring(concat($s, " "), 1, 6)
}
for $c in //connection return concat(pad($c/@name), " ", replace($c/hostPort, ":", " "))
'