0

Is it possible using awk or sed to get the line number of a line such that it is the first line matching a regex after another line matching another regex?

In other words:

  1. Find line l1 matching regex r1. l1 is the first line matching r1.
  2. Find line l2 below l1. l2 matches regex r2. l2 is the first line matching r2, ignoring lines l1 and above.

Clarification: By match I mean partial match, for most general solution. A partial match can of course be turned into a full-word match with \<...\> or a full-line match with ^...$.

Example input:

- - '787928'
  - stuff
- - '810790'
  - more stuff
- - '787927'
  - yet more stuff
- - '828055'
  - some more stuff
- - '828472'
  - some other stuff

If r1 is ^-.*787927.* and r2 is ^- I'd expect the output to be 7, i.e. the number of the line that says - - '828055'.

SU3
  • 5,064
  • 3
  • 35
  • 66
  • 3
    Sorry, this is not the way StackOverflow works. Questions of the form "I want to do X, please give me tips and/or sample code" are considered off-topic. Please visit the [help] and read [ask], and especially read [Why is “Can someone help me?” not an actual question?](http://meta.stackoverflow.com/q/284236) – kvantour Jun 14 '19 at 12:58
  • 1
    Don't use the word `pattern` as it's highly ambiguous. Instead use the word `regexp` or `string`, whichever it is you mean, and clarify if you want partial, full-word, or full-line matches or something else. Additionally - whatever it is you're trying to do post concise, testable sample input and expected output that full covers all your requirements. – Ed Morton Jun 14 '19 at 15:48

3 Answers3

3

Input example :

world
zekfzlefkzl
fezekzevnkzjnz
hello
zeniznejkglz
world
eznkflznfkel
hello
zenilzligeegz
world

Command :

pat1="hello"; pat2="world";
awk -v pat1=$pat1 -v pat2=$pat2 '$0 ~ pat1{pat1_match = 1}($0 ~ pat2)&&pat1_match{print NR; exit}' <input>

Output :

6
Corentin Limier
  • 4,946
  • 1
  • 13
  • 24
  • 2
    A tiny bit shorter: `awk '/hello/,/world/ { if ($0 ~ /world/) {print NR; exit } }'` – kvantour Jun 14 '19 at 13:05
  • 1
    Thanks ! I didn't know record ranges. For those like me who want to learn more about it : https://www.gnu.org/software/gawk/manual/html_node/Ranges.html . Great solution – Corentin Limier Jun 14 '19 at 13:16
  • `awk '/hello/,/world/{a=NR;next}a{print a; exit}'` – Corentin Limier Jun 14 '19 at 13:20
  • 1
    This will fail if "world" is on the last line. – kvantour Jun 14 '19 at 13:26
  • 1
    But to quote [Ed Morton](https://stackoverflow.com/questions/17908555/printing-with-sed-or-awk-a-line-following-a-matching-pattern/17914105#comment92813185_17914105): _Never use range expressions (e.g. `/start/,/end/`) as they make trivial tasks very slightly briefer but then require duplicate conditions or a complete rewrite for the tiniest requirements change._ – kvantour Jun 14 '19 at 13:29
  • 1
    Ed Morton sure is wise. Thanks for showing me this trick anyway. – Corentin Limier Jun 14 '19 at 13:31
3

For an input file that looks like this:

 1  pat2
 2  x
 3  pat1
 4  x
 5  pat2
 6  x
 7  pat1
 8  x
 9  pat2

you could use sed as follows:

$ sed -n '/pat1/,${/pat2/{=;q;};}' infile
5

which works like this:

sed -n '       # suppress output with -n
/pat1/,$ {     # for all lines from the first occurrence of "pat1" on...
    /pat2/ {   # if the line matches "pat2"
        =      # print line number
        q      # quit
    }
}' infile

The above fails if the first occurrence of pat1 is on the same line as pat2:

 1  pat2
 2  x
 3  pat1 pat2
 4  x
 5  pat2
 6  x
 7  pat1
 8  x
 9  pat2

would print 3. With GNU sed, we can use this instead:

$ sed -n '0,/pat1/!{/pat2/{=;q;};}' infile
5
sed -n '     # suppress output
0,/pat1/! {  # for all lines after the first occurrence of "pat1"
    /pat2/ { # if the line matches "pat2"
        =    # print line number
        q    # quit
    }
}' infile

The 0 address is a GNU extension; using 1 instead would break if pat1 was on the first line.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
  • I actually tried something very similar, but didn't know what to put in place of `{=;q;}`. – SU3 Jun 14 '19 at 16:12
  • I just tested this, and it doesn't actually have exactly the right behavior. If `pat2` can match the line that was matched by `pat1` it will print that line, rather then the next match. Notice that in the question I emphasized the word **below**. – SU3 Jun 14 '19 at 16:21
  • @SU3 True – a testable input and output example would have helped make that more clear. – Benjamin W. Jun 14 '19 at 16:25
  • Still, a very useful answer. If you come up with something that handles the exceptional case, please, don't remove the first solution. – SU3 Jun 14 '19 at 16:37
  • @SU3 I've added a solution for that, but it requires GNU sed. – Benjamin W. Jun 14 '19 at 17:45
0

This might work for you (GNU sed):

sed -n '/^-.*787927.*/{:a;n;/^-/!ba;=;q}' file

On encountering a line that begins -.*787927.*, start a loop that replaces the current line with the next, until a line begins - where on print the line number and quit.

potong
  • 55,640
  • 6
  • 51
  • 83