Bash - numbers of multiple lines matching regex (possible oneliner?)

Question

I'm not very fluent in bash but actively trying to improve, so I'd like to ask some experts here for a little suggestion :)

Let's say I've got a following text file:

Some
spam
about which I don't care.
I want following letters:
X1
X2
X3
I do not want these:
X4
X5
Nor this:
X6
But I'd like these, too:
I want following letters:
X7
And so on...

And I'd like to get numbers of lines with these letters, so my desired output should look like:
5 6 7 15

To clarify: I want all lines matching some regex /\s*X./, that occur right after one match with another regex /\sI want following letters:/

Right now I've got a working solution, which I don't really like:

cat data.txt | grep -oPz "\sI want following letters:((\s*X.)*)" | grep -oPz "\s*X." > tmp.txt

for entry in $(cat tmp.txt); do
 grep -n $entry data.txt | cut -d ":" -f1
done

My question is: Is there any smart way, any tool I don't know with a functionality to do this in one line? (I esspecially don't like having to use temp file and a loop here)

hek2mgl · Accepted Answer · 2018-07-04T18:31:44.493

3

You can use awk:

awk '/I want following/{p=1;next}!/^X/{p=0;next}p{print NR}' file

Explanation in multiline version:

#!/usr/bin/awk

/I want following/{
    # Just set a flag and move on with the next line
    p=1
    next
}

!/^X/ {
    # On all other lines that doesn't start with a X
    # reset the flag and continue to process the next line
    p=0
    next
}

p {
    # If the flag p is set it must be a line with X+number.
    # print the line number NR
    print NR
}

edited Jul 04 '18 at 18:31

answered Jul 04 '18 at 18:21

hek2mgl

152,036
28
249
266

1

Thanks for the solution and, especially, your explaination! I guess I should familiarise with awk, looks like a fun language – Jul 04 '18 at 19:03
1

@vynaloze It definitely is. Also it shows a very good performance even if you process very large or many files. While awk one liners might look complicated first, once you break it down into multiple lines it is pretty simple to learn. – hek2mgl Jul 04 '18 at 21:19

score 1 · Answer 2 · answered Jul 04 '18 at 18:22

Following may help you here.

awk '!/X[0-9]+/{flag=""} /I want following letters:/{flag=1} flag'  Input_file

Above will print the lines which have I want following letters: too in case you don't want these then use following.

awk '!/X[0-9]+/{flag=""} /I want following letters:/{flag=1;next} flag' Input_file

To add line number to output use following.

awk '!/X[0-9]+/{flag=""} /I want following letters:/{flag=1;next} flag{print FNR}' Input_file

Idriss Neumann · Answer 3 · 2018-07-04T18:52:36.543

First, let's optimize a little bit your current script:

#!/bin/bash

FILE="data.txt"

while read -r entry; do
  [[ $entry ]] && grep -n $entry "$FILE" | cut -d ":" -f1
done < <(grep -oPz "\sI want following letters:((\s*X.)*)" "$FILE"| grep -oPz "\s*X.")

And here's some comments:

No need to use cat file|grep ... => grep ... file
Do not use the syntaxe for i in $(command), it's often the cause of multiple bugs and there's always a smarter solution.
No need to use a tmp file either

And then, there's a lot of shorter possible solutions. Here's one using awk:

$ awk '{ if($0 ~ "I want following letters:") {s=1} else if(!($0 ~ "^X[0-9]*$")) {s=0}; if (s && $0 ~ "^X[0-9]*$") {gsub("X", ""); print}}' data.txt
1
2
3
7

Bash - numbers of multiple lines matching regex (possible oneliner?)

3 Answers3