0

my input is as follow

<connection name="test1" transport="tcp">
<LPort>host1:11111</hostPort>
<hostPort>host1:11111</hostPort>
<abcd> 1234

<connection name="test2" transport="tcp">
<hostPort>host2:22222</hostPort>
<GPort>host1:12111</hostPort>

<connection name="xyz1" transport="tcp">
<hostPort>host3:33333</hostPort>
<FPort>host1:12113</hostPort>
<efgi> 5678

<connection name="xyz2" transport="tcp">
<LPort>host1:12234</hostPort>
<hostPort>host4:4444</hostPort>

I want my out put t be as follow:

test1  host1 1111
test2  host2 2222
xyz1   host3 3333
xyz2   host4 4444

To get this out put this is what I do and it works. But it seems to me there must be a better and simpler way of doing it, I did not include the entire logic ( the array ); but I have been using this method a lot when I have multiple searches in a file and it works. I tried to combine the awk commands using && command and it failed.

Below is part of my code & logic 1) I cat the file 2 ) Get rid of extra character and replace with space Using sed 3 ) I take the value I want and assign to array value Using awk Please note I have not included the rest of the logic ( but it works ) In short I do a while loop then assign the values to 2 or 3 arrays and print them on same line to get the desired output

cat file  | grep -A5 connection  | sed s'/[:="><]/ /g' | awk '/name/ {print $3}'
cat file | grep -A5 connection  | sed s'/[:="><]/ /g' | awk '/hostPort/ {print $2 " " $3}'

If possible, please provide an alternative solution that does not involve storing my search criteria in an array using sed/awk or any other way of doing this ?

If you can provide a solution, please provide details for each option; if you can.

Thank you

theuniverseisflat
  • 861
  • 2
  • 12
  • 19

3 Answers3

0

With single sed approach:

sed -n '/<connection/{N;N; s/<connection name="\([^"]*\)".*<hostPort>\([^:]*\):\([^<]*\).*/\1 \2 \3/p}' file

The output:

test1 host1 11111
test2 host2 22222
xyz1 host3 33333
xyz2 host4 4444

  • N;N; - append the next 2 lines to the pattern space (including newlines)

  • connection name="\([^"]*\) - capturing connection name

  • <hostPort>\([^:]*\):\([^<]*\) - capturing host name and port number

RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • Hi Roman - I did attempt to use this however got the following error >>> bad flag in substitute command: '}' – theuniverseisflat May 27 '17 at 13:11
  • @theuniverseisflat, check your command for typos. It works fine – RomanPerekhrest May 27 '17 at 13:13
  • I copy and pasted ur command ; however I am on my mac computer not at work maybe there is something with the shell used . however I try it at work. thanks again – theuniverseisflat May 27 '17 at 13:18
  • @theuniverseisflat, as you added `linux` tag - it'll work on linux. You did not say anything about macos – RomanPerekhrest May 27 '17 at 13:20
  • question if there are more than 2 lines referring to ur command N;N; lets say 3 or 5 ; how would you change this? I never used this before? And what shell are you in? – theuniverseisflat May 27 '17 at 13:21
  • No I do not use this on macos shell. I use it at work, But I am not there today. I can use KSH or Bourne shell. I will try it there as well. – theuniverseisflat May 27 '17 at 13:23
  • @theuniverseisflat, you have already accepted the answer. what this conversation for? – RomanPerekhrest May 27 '17 at 13:23
  • I am just trying to learn various ways of doing the same thing. If you do not wish to continue I understand. I hope to learn from you a bit about your approach and using SED. I accepted the answer because it offered an answer that worked . Thanks again for ur time. – theuniverseisflat May 27 '17 at 13:27
  • @theuniverseisflat: With a single-quoted `sed` script (`'...'`), which specific (POSIX-like) shell you're using is irrelevant; what matters is the implementation of `sed` you're using, and there are significant differences between the BSD implementation found on macOS and the GNU implementation found on Linux - see [this answer](http://stackoverflow.com/a/24276470/45375) of mine. – mklement0 May 27 '17 at 21:46
0

The obligatory reminder: assuming your input is well-formed XML, using an XML parser will provide a more robust solution (see bottom).

Here's a single-utility awk solution:

awk -v RS= -F '<connection name="|<hostPort>' '
  {
    sub(/".*/, "", $2)
    split($3, tokens, /[:<]/)
    printf "%-6s %s %s\n", $2, tokens[1], tokens[2]
  }
' file
  • -v RS= tells awk to split the input into records by paragraphs, where a paragraph is a run of non-empty lines.

  • -F '<connection name="|<hostPort>' splits each paragraph into fields by occurrences of <connection name=" or (|) <hostPort>, so that the data of interest will be at the start of 2nd and 3rd fields ($2 and $3).

  • sub(/".*/, "", $2) removes everything following the first " from field 2, effectively leaving just the connection name.

  • split($3, tokens, /[:<]/) splits the 3rd field into an array of tokens by occurrences of : and <, yielding the host name in the 1st array element and the port in the 2nd.

  • printf "%-6s %s %s\n", $2, tokens[1], tokens[2] prints an output line, right-padding the connection name to at least 6 characters with spaces, as in your sample output; simply omit the -6 if you just want a single space to separate the output fields.


Optional reading: XML-parsing utilities (CLIs) usable in shell scripts

  • xmllint is preinstalled on some platforms:

    • macOS/FreeBSD/PC-BSD (possibly other BSD variants)
    • some Linux distributions: Fedora, CentOS
    • On others, a package may be available; e.g., on Ubuntu:
      sudo apt-get install libxml2-utils
    • Caveat: While xmllint supports XPath 1.0 queries, it allows virtually no control over the output format.
  • Install-on-demand alternatives - superior to xmllint:

    • xmlstarlet

      • xmlstarlet is powerful and flexible, supporting a wide range of operations.

      • macOS: Install via Homebrew with brew install xmlstarlet

      • Linux: chances are that it can be installed with your platform's package manager; e.g., on Debian-based distros such as Ubuntu:
        sudo apt-get install xmlstarlet
      • Windows: download and install manually from sourceforge.
    • xidel

      • xidel requires manual download and installation, but its power and flexibility make up for this inconvenience.

      • Supports Linux, macOS, and Windows


Below are solutions that contrast the 3 utilities listed above.

The following well-formed XML document is assumed to be contained in file - note how the <connection> elements are now enclosed in a single, top-level <doc> element:

<doc>
  <connection name="test1" transport="tcp">
    <LPort>host1:11111</LPort>
    <hostPort>host1:11111</hostPort>
    <abcd>1234</abcd>
  </connection>

  <connection name="test2" transport="tcp">
    <hostPort>host2:22222</hostPort>
    <GPort>host1:12111</GPort>
  </connection>

  <connection name="xyz1" transport="tcp">
    <hostPort>host3:33333</hostPort>
    <FPort>host1:12113</FPort>
    <efgi>5678</efgi>
  </connection>

  <connection name="xyz2" transport="tcp">
    <LPort>host1:12234</LPort>
    <hostPort>host4:4444</hostPort>
  </connection>
</doc>

xmllint solution:

xmllint's lack of control over formatting of the query results requires a nontrivial awk helper command:

echo 'cat //connection/@name | //hostPort/text()' | xmllint --shell file | awk -F\" '
  NR % 2 { next }                  # skip separator lines
  NR % 4 == 2 { conn = $2; next }  # save connnection name
  { 
    split($0, tokens, ":")
    printf "%-6s %s %s\n", conn, tokens[1], tokens[2] 
  }
'

xmlstarlet solution:

xmlstarlet's sel sub-command supports very flexible extractions by translating options into XLST templates behind the scenes:

xmlstarlet sel -t  -m '//connection' -v 'str:align(@name, "      ")' \
           -o ' ' \
           -c 'str:replace(hostPort, ":", " ")' -n file

xidel solution:

xidel is very flexible and not only supports XML, but also HTML and JSON.

While it has no support for XLST, it supports XQuery, a superset of XPath with XSLT-like features, which enables powerful transformations.Tip of the hat to Reino.
As far as I can tell, there is no function for padding, however, so a - straightforward - auxiliary awk command is used:

xidel file -q --xquery \
  'for $c in //connection return concat($c/@name, " ", replace($c/hostPort, ":", " "))' |
    awk '{ printf "%-6s %s %s\n", $1, $2, $3 }'

That said, XQuery even supports user-defined functions, so you can write your own padding function:

xidel -q file --xquery '
  declare function pad($s as xs:string?) as xs:string 
  {
    substring(concat($s, "      "), 1, 6)
  }
  for $c in //connection return concat(pad($c/@name), " ", replace($c/hostPort, ":", " "))
'
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • @Reino: I've changed it to what you see now because it (a) works with the sample XML document I've posted (note the enclosing `` element, which is missing from the document you use in the linked page) and (b) it works on _Linux_, which is what this question is about (your commands assume _Windows_). As for why your comments disappeared: I've flagged them as _obsolete_, because the points made in them have been incorporated into the answer. While I cannot _personally_ make comments disappear, _moderators_ agreed with my assessment, which is what actually made them disappear. – mklement0 May 29 '17 at 20:25
  • @Reino: I've made it clearer that I've enclosed the `` elements in a single, top-level element. You are correct about the equivalence, but you need to be mindful of the context of the question. At first, before I looked into XQuery, I blindly tried your command as-is, not realizing that the quoting wouldn't work on _Unix_ platforms. – mklement0 May 29 '17 at 21:08
0

To merge blank line separated blocks and extract desired values from each block using backreferences:

sed '${/^$/!{H;s/.*//;};};/^$/!{H;d;};/^$/{x;s/^\n<connection name="\([^"]*\)".*<hostPort>\([^:]*\):\([^<]*\).*/\1 \2 \3/;};' file
SLePort
  • 15,211
  • 3
  • 34
  • 44
  • Hi SlePort I tried to use ur command but I get the following error also could u please provide more detail? cat file | sed '${/^$/!{H;s/.*//;}};/^$/!{H;d;};/^$/{x;s/^\n\([^:]*\):\([^<]*\).*/\1 \2 \3/;}' extra characters at the end of } command – theuniverseisflat May 27 '17 at 13:14
  • You must escape parentheses in BRE mode: `\(` and `\)` or use the `-E` flag: `sed -E`. And you don't need to pipe your sed to `cat`. Just copy and paste my sed command to test it. – SLePort May 27 '17 at 13:25