how to get sub-expression value of regExp in awk?

Question

I was analyzing logs contains information like the following:

y1e","email":"","money":"100","coi

I want to fetch the value of money, i used 'awk' like :

grep pay action.log | awk '/"money":"([0-9]+)"/' ,

then how can i get the sub-expression value in ([0-9]+) ?

A sed version would be: `sed -r 's|^.*money":"([0-9]*)".*|\1|'` or if you don't want to print lines that do not contain `money`: `sed -n -r 's|^.*money":"([0-9]*)".*$|\1|p'` — Op De Cirkel, Jun 06 '12 at 11:57
@Op De Cirkel Thank you! Seems 'sed' is more powerful! Why 'awk' has no such design? — RoyHu, Jun 08 '12 at 12:17

score 5 · Accepted Answer · answered Jun 07 '12 at 02:22

5

If you have GNU AWK (gawk):

awk '/pay/ {match($0, /"money":"([0-9]+)"/, a); print substr($0, a[1, "start"], a[1, "length"])}' action.log

If not:

awk '/pay/ {match($0, /"money":"([0-9]+)"/); split(substr($0, RSTART, RLENGTH), a, /[":]/); print a[5]}' action.log

The result of either is 100. And there's no need for grep.

answered Jun 07 '12 at 02:22

Dennis Williamson

346,391
90
374
439

Thanks. Pretty near to what i expected, but is there a more clever way? – RoyHu Jun 07 '12 at 04:25
@RoyHu: The 1 in the array index refers to the capture group. I don't know of any other way to do that in awk or gawk. Gawk has a function `gensub()` that can be used for *replacing* the contents of a capture group. You could use it, but the expressions would be more complex for the use in your question. – Dennis Williamson Jun 07 '12 at 10:39
Thanks. And i got one using gensub : grep pay action.log | awk -F "\n" 'm=gensub(/.*money":"([0-9]+)".*/, "\\1", "g", $1) {print m}' – RoyHu Jun 07 '12 at 12:15
If you have `gawk` installed, in the first example, the print clause can be simplified to `print a[1];` – Mahn May 31 '15 at 23:42

Levon · Answer 2 · 2012-06-06T18:41:48.177

2

Offered as an alternative, assuming the data format stays the same once the lines are grep'ed, this will extract the money field, not using a regular expression:

awk -v FS=\" '{print $9}' data.txt

assuming data.txt contains

y1e","email":"","money":"100","coin.log

yielding:

I.e., your field separator is set to " and you print out field 9

edited Jun 06 '12 at 18:41

answered Jun 06 '12 at 12:12

Levon

138,105
33
200
191

Thanks. but the field where contains "money" info may not be fixed! – RoyHu Jun 07 '12 at 04:23
I think of one more way: grep pay action.log | awk -F "\n" 'm=gensub(/.*money":"([0-9]+)".*/, "\\1", "g", $1) {print m}' – RoyHu Jun 07 '12 at 04:27

score 0 · Answer 3 · edited May 23 '17 at 12:24

0

You need to reference group 1 of the regex

I'm not fluent in awk but here are some other relevant questions

awk extract multiple groups from each line

GNU awk: accessing captured groups in replacement text

Hope this helps

edited May 23 '17 at 12:24

Community

1
1

answered Jun 06 '12 at 11:52

buckley

13,690
3
53
61

Thanks you ! inspired by 'gensub' i got grep pay user_action.log | awk -F "\n" 'm=gensub(/.*money":"([0-9]+)".*/, "\\1", "g", $1) {print m}' – RoyHu Jun 07 '12 at 04:18

score 0 · Answer 4 · answered Jun 06 '12 at 16:03

0

If you have money coming in at different places then may be it would not be a good idea to hard code the positional parameter.

You can try something like this -

$ awk -v FS=[,:\"] '{ for (i=1;i<=NF;i++) if($i~/money/) print $(i+3)}' inputfile

answered Jun 06 '12 at 16:03

jaypal singh

74,723
23
102
147

Thanks that works.but i want to know how awk fetch the group 1 value. – RoyHu Jun 07 '12 at 04:22

score 0 · Answer 5 · answered Jun 07 '12 at 04:29

0

grep pay action.log | awk -F "\n" 'm=gensub(/.*money":"([0-9]+)".*/, "\\1", "g", $1) {print m}'

answered Jun 07 '12 at 04:29

RoyHu

333
2
3
13

3

You should refactor out the `grep`. Remember that `grep 'foo' file | awk '{ bar }'` is basically always better written as `awk '/foo/ { bar }' file`. – tripleee Aug 24 '15 at 14:32

how to get sub-expression value of regExp in awk?

5 Answers5

Linked