0

I am not really sure if this is already out there but I am having some trouble trying to express what I want to be done in a search field.

Say I had a file like the following

foo1
foo2
foo3
barbarbar
barbarbar
barbarbar
foo4
foo5
barbarbar
barbarbar
barbarbar
foo6
foo7

I want to only get the fields that have something other than "foo#" after a "foo#". I also want the "foo#" to be printed if it does have barbarbar after it and everthing that until the next "foo#". An example output would be below.

foo3
barbarbar
barbarbar
barbarbar

foo5
barbarbar
barbarbar
barbarbar
Inian
  • 80,270
  • 14
  • 142
  • 161
  • What have you tried, and where are you stuck? – glenn jackman Jul 20 '17 at 15:39
  • 1
    if you want hints.. see https://stackoverflow.com/questions/38972736/how-to-select-lines-between-two-patterns... a way to simplify logic is to reverse the input file line wise (tac), use `barbarbar` as starting pattern and `foo` as ending pattern.. finally reverse the output again.. – Sundeep Jul 20 '17 at 15:40
  • 1
    @sundeep, good idea. Even simpler is to use the *absence of foo* as the starting pattern. – glenn jackman Jul 20 '17 at 15:45
  • @Donald, I'm sure this kind of question has been answered many times here, but formulating a decent search query may be tricky. – glenn jackman Jul 20 '17 at 15:46
  • @glennjackman yeah that's a good observation... the sample may be misleading as OP says `barbarbar after it and everthing that until the next "foo#"` – Sundeep Jul 20 '17 at 15:51

2 Answers2

2

It looks like this sed command will work :

sed -n '/^foo[0-9]$/{N;:l;/\nfoo[0-9]$/{D;bl};p;b};p'

I honestly regret trying to do that with sed though, the result is hardly comprehensible.

/^foo[0-9]$/{     # if a line matches fooX
    N                # retrieve another line
    :l               # we'll jump to here later ; label l
    /\nfoo[0-9]$/{   # if the following line matches fooX too
        D                # discard the first line and consume another one
        bl               # jump back to l
    }                # at this point we have a fooX line followed by a non-fooX line
    p                # print what we've matched
    b                # stop processing this line (jump to the end of the script)
}                 # reached when the first line read doesn't match fooX
p                 # print the line

Edit : now that I've laid it out this way, I notice that the p;b part can be removed since it will behave the same by reaching the last p. If I was the poor soul having to maintain this, I think I'd rather have it there though.

Aaron
  • 24,009
  • 2
  • 33
  • 57
1

I don't normally like to post answers to questions that "haven't tried", but I've got some time on my hands. I like to use awk for stuff like this:

The "state machine" approach:

awk '
    /foo/ && afterfoo != "" { print currentfoo; print afterfoo; afterfoo = "" }
    /foo/  { currentfoo = $0 }
    !/foo/ { afterfoo = afterfoo $0 "\n" }
    END    { if (afterfoo != "") {print currentfoo; print afterfoo} }
' file

The "reverse-process-reverse" approach:

tac file | awk '
    !/foo/ { print; seenfoo = 0 }
    /foo/ && !seenfoo++
' | tac
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • I came up with `awk '/foo/{s=$0;c=1;next} --c==0{print s} 1'` and `tac file | sed -n '/barbarbar/,/foo/p' | tac` – Sundeep Jul 20 '17 at 16:03