2

I want to print the matched pattern using awk. Not the field, not the line.

In vi, you can put the matched pattern in the substitution by surrounding it with parens and referencing it with curly braces and a number, like this:

:s/bufid=([0-9]*)/buffer id is {\0}/

The part that matches between parens is remembered and can be used.

In perl, it is similar

$_ = "Hello there, neighbor";
if (/\s(\w+),/) {             # memorize the word between space and comma
  print "the word was $1\n";  # the word was there
}

Is there any way I can do something similar with awk? I just want to extract the buffer id and print it, and only it.

The input line is XML, and will contain (among other things) 'bufId="123456"'. I want to print "123456"

so ...

awk < file.xml '/bufId="([0-9]*)"/ { print X; }'

What do I put where X is?

Can this even be done?

rmhartman
  • 89
  • 5
  • The closest I have found to doing what I want is this compound: grep 'bufId="[0-9]*"' | sed 's/^.*bufId="//' | sed 's/\([0-9]*\)".*$/\1/' – rmhartman Mar 09 '18 at 01:34
  • You could just use an XML parser and implement it in one single line! This will also provide more robust solutions in case you have another attribute `evilAttribute` containing values like `bufld` for example that might trick `awk` or `sed` – Allan Mar 09 '18 at 01:46

4 Answers4

3

with gawk

awk '{print gensub(/.*bufId="([0-9]*)"/,"\\1",1)}'

if you want the result to be quoted you have to capture the quotes as well.

karakfa
  • 66,216
  • 7
  • 41
  • 56
2

This seems like a close approximation of what you were after. Not sure awk is going to be your best tool for this.

echo '<root><a bufId="123456"/></root>' | awk 'match($0, /bufId="/) { print substr($0, RSTART+7, RLENGTH-1)}'

This was a helpful starting point.

mattmc3
  • 17,595
  • 7
  • 83
  • 103
1

Also with gawk (third param in match is specific to it):

~/test£ cat test
abc
~/test£ gawk '{ match($0, /a(.)(.)/, group)}{ print group[2] group[1]}' test
cb
zzxyz
  • 2,953
  • 1
  • 16
  • 31
1

Instead of going for a awk solution for this I would highly recommend using an XML parser:

$ cat file.xml
<elems><elem bufId="123456"/></elems>

$ xmllint --xpath "concat('\"',string(//elem/@bufId),'\"')" file.xml
"123456"

$ xmllint --xpath "string(//elem/@bufId)" file.xml
123456

Depending on if you want to have quotes in your output or not.

Another valid solution would be to use sed (if you really dislike XPATH and XML parser, and since there are already many good awk solutions I will introduce this one as well):

$ sed -n 's/^.*bufId="\([0-9]*\)".*$/\1/gp' file.xml
123456

$ sed -n 's/^.*bufId="\([0-9]*\)".*$/"\1"/gp' file.xml
"123456
Allan
  • 12,117
  • 3
  • 27
  • 51
  • 1
    An XML parser to parse XML? Have you gone completely mad!? – zzxyz Mar 09 '18 at 02:04
  • thank you for this. I posed my own grep/sed solution above, I was hoping for something simpler. I hadn't thought of an xml parser. I could do it in perl ... but it wouldn't be a one-liner ... – rmhartman Mar 10 '18 at 01:24