Bash to extract number followed by a specific string and a :

Question

I have this file with number of

Timing=Time1:5/1,Time2:3/1,Time3:4/1,Time4:4/1
Timing=Time1:1/1,Time2:3/1,Time3:5/1,Time4:9/1
...

In a metrics file.

For format here is Time1 is the name of the metric, the next number is the number of time units it took and the number followed by the slash is a number denoting what kind of time unit it is (1 is millis).

I am trying to parse out all the instances of the value of Time3 from this list of times.

So in the above example I am trying to parse out

4
5

How would I accomplish doing that through bash/awk/sed/grep etc?

score 2 · Accepted Answer · answered Mar 24 '23 at 02:08

With just `GNU` grep:

$ grep -oP 'Time3:\K\d+(?=/\d)' file
4
5

If you think the look ahead is not necessary, feel free to remove it:

 $ grep -oP 'Time3:\K\d+

The regular expression matches as follows:

Node	Explanation
`Time3:`	'Time3:'
`\K`	resets the start of the match (what is `K`ept) as a shorter alternative to using a look-behind assertion: look arounds and Support of K in regex
`\d+`	digits (0-9) (1 or more times (matching the most amount possible))
`(?=`	look ahead to see if there is:
`/`	/
`\d`	digits (0-9)
`)`	end of look-ahead

score 2 · Answer 2 · answered Mar 24 '23 at 02:26

2

Using GNU sed:

sed -rn 's/.*Time3:([0-9]+).*/\1/p' file

Explanation:

-r,       --> --regexp-extended
-n,       --> --quiet, --silent
              suppress automatic printing of pattern space
.*Time3:  --> to match from the start of the line till Time3: in the line
([0-9]+)  --> capture the number in the group using parenthesis : ()
.*       --> to match the rest of the characters till end of line
\1/p       --> to print the first captured group

Output:

4
5

answered Mar 24 '23 at 02:26

User123

1,498
2
12
26

`gnu` really need to align their labeling between `gnu-grep` and `gnu-sed` : `--quiet —silent` in `gnu-grep` actually means print absolutely nothing to `/dev/stdout` at all, and typically used for checking the exit status of it. `gnu-sed`'s `--quiet --silent` only means suppressing auto printing, but doesn't mean nothing can be printed at all if one executes the `p` command within it. The `bsd-sed` `man` page doesn't use either of those terms (or its synonyms) when describing `-n`. – RARE Kpop Manifesto Mar 24 '23 at 08:42

score 0 · Answer 3 · answered Mar 24 '23 at 02:07

0

This did the trick for me

regex="Time3:([0-9]+)"
while read line; do
    if [[ $line =~ $regex ]]; then
        echo "${BASH_REMATCH[1]}"
    fi
done < inputfile

answered Mar 24 '23 at 02:07

Arunav Sanyal

1,708
1
16
36

RARE Kpop Manifesto · Answer 4 · 2023-03-24T08:26:25.180

as long as you don't care about duplicates showing up multiple times, and "Time3:" only exist once per row, must exist in every row, and no other pattern is immediately before a colon (:) other than TimeN then a minimalist ERE regex via awk suffices - neither capturing groups nor lookahead/lookbehind are essential:

     1  Timing=Time1:5/1,Time2:3/1,Time3:4/1,Time4:4/1
     2  Timing=Time1:1/1,Time2:3/1,Time3:5/1,Time4:9/1
     3  Timing=Time1:1/1,Time2:3/1,Time3:5/1,Time4:9/1
     4  Timing=Time1:5/1,Time2:3/1,Time3:4/1,Time4:4/1
     5  Timing=Time1:5/1,Time2:3/1,Time3:4/1,Time4:4/1
     6  Timing=Time1:1/1,Time2:3/1,Time3:5/1,Time4:9/1
     7  Timing=Time1:1/1,Time2:3/1,Time3:5/1,Time4:9/1
     8  Timing=Time1:5/1,Time2:3/1,Time3:4/1,Time4:4/1

```
mawk ++NF FS='^.+3:|/.+$' OFS= 
```

Since OFS is an empty string anyway, doing ++NF doesn't end up padding unwanted trailing spaces.

if you wanna generalize it and simply split up the components, try :

 gawk NF=NF FS='([:=]|/.(,|$))' OFS=' | '

     1  Timing | Time1 | 5 | Time2 | 3 | Time3 | 4 | Time4 | 4 | 
     2  Timing | Time1 | 1 | Time2 | 3 | Time3 | 5 | Time4 | 9 | 
     3  Timing | Time1 | 1 | Time2 | 3 | Time3 | 5 | Time4 | 9 | 
     4  Timing | Time1 | 5 | Time2 | 3 | Time3 | 4 | Time4 | 4 | 
     5  Timing | Time1 | 5 | Time2 | 3 | Time3 | 4 | Time4 | 4 | 
     6  Timing | Time1 | 1 | Time2 | 3 | Time3 | 5 | Time4 | 9 | 
     7  Timing | Time1 | 1 | Time2 | 3 | Time3 | 5 | Time4 | 9 | 
     8  Timing | Time1 | 5 | Time2 | 3 | Time3 | 4 | Time4 | 4 |

Obviously just [[:punct:]] would work too, but you'll get back all the extra 1s as well —- if they show up as values other than 1 at times then this might have some utility to it

You can also do it the RS way but it's verbose and unseemly :

 gawk NF RS='(^)?[^\n]+Time3:|/[^\n]+(\n|$)'

Bash to extract number followed by a specific string and a :

4 Answers4

With just GNU grep:

The regular expression matches as follows:

With just `GNU` grep: