Using matched pattern in awk

Question

I want to print the matched pattern using awk. Not the field, not the line.

In vi, you can put the matched pattern in the substitution by surrounding it with parens and referencing it with curly braces and a number, like this:

:s/bufid=([0-9]*)/buffer id is {\0}/

The part that matches between parens is remembered and can be used.

In perl, it is similar

$_ = "Hello there, neighbor";
if (/\s(\w+),/) {             # memorize the word between space and comma
  print "the word was $1\n";  # the word was there
}

Is there any way I can do something similar with awk? I just want to extract the buffer id and print it, and only it.

The input line is XML, and will contain (among other things) 'bufId="123456"'. I want to print "123456"

so ...

awk < file.xml '/bufId="([0-9]*)"/ { print X; }'

What do I put where X is?

Can this even be done?

The closest I have found to doing what I want is this compound: grep 'bufId="[0-9]*"' | sed 's/^.*bufId="//' | sed 's/$[0-9]*$".*$/\1/' — rmhartman, Mar 09 '18 at 01:34
You could just use an XML parser and implement it in one single line! This will also provide more robust solutions in case you have another attribute `evilAttribute` containing values like `bufld` for example that might trick `awk` or `sed` — Allan, Mar 09 '18 at 01:46

score 3 · Answer 1 · answered Mar 09 '18 at 01:11

3

with gawk

awk '{print gensub(/.*bufId="([0-9]*)"/,"\\1",1)}'

if you want the result to be quoted you have to capture the quotes as well.

answered Mar 09 '18 at 01:11

karakfa

66,216
7
41
56

score 2 · Answer 2 · answered Mar 09 '18 at 01:10

2

This seems like a close approximation of what you were after. Not sure awk is going to be your best tool for this.

echo '<root><a bufId="123456"/></root>' | awk 'match($0, /bufId="/) { print substr($0, RSTART+7, RLENGTH-1)}'

This was a helpful starting point.

answered Mar 09 '18 at 01:10

mattmc3

17,595
7
83
103

This also requires `gawk` -- worth mentioning since `mawk` is default awk on Ubuntu, at least. – zzxyz Mar 09 '18 at 01:15
Are you sure? Works on macOS, which has the BSD awk, not gnu awk. – mattmc3 Mar 09 '18 at 01:16
1

no I'm wrong, sorry. I believe the way you're using `match`, it won't. – zzxyz Mar 09 '18 at 01:19

score 1 · Answer 3 · answered Mar 09 '18 at 01:34

1

Also with gawk (third param in match is specific to it):

~/test£ cat test
abc
~/test£ gawk '{ match($0, /a(.)(.)/, group)}{ print group[2] group[1]}' test
cb

answered Mar 09 '18 at 01:34

zzxyz

2,953
1
16
31

score 1 · Answer 4 · answered Mar 09 '18 at 01:44

1

Instead of going for a awk solution for this I would highly recommend using an XML parser:

$ cat file.xml
<elems><elem bufId="123456"/></elems>

$ xmllint --xpath "concat('\"',string(//elem/@bufId),'\"')" file.xml
"123456"

$ xmllint --xpath "string(//elem/@bufId)" file.xml
123456

Depending on if you want to have quotes in your output or not.

Another valid solution would be to use sed (if you really dislike XPATH and XML parser, and since there are already many good awk solutions I will introduce this one as well):

$ sed -n 's/^.*bufId="\([0-9]*\)".*$/\1/gp' file.xml
123456

$ sed -n 's/^.*bufId="\([0-9]*\)".*$/"\1"/gp' file.xml
"123456

answered Mar 09 '18 at 01:44

Allan

12,117
3
27
51

1

An XML parser to parse XML? Have you gone completely mad!? – zzxyz Mar 09 '18 at 02:04
thank you for this. I posed my own grep/sed solution above, I was hoping for something simpler. I hadn't thought of an xml parser. I could do it in perl ... but it wouldn't be a one-liner ... – rmhartman Mar 10 '18 at 01:24

Using matched pattern in awk

4 Answers4