How to print lines between two occurrences of the same character?

Question

I have very large text files of the form below:

>randomheader1 some info flag1
data
moredata
someextradata
>randomheader2 some info flag2
littledata
somedata
>randomheader3 some info flag1
one
two
three
four
>randomheader4 some info flag3
....

I want to get the output of the lines following the line containing flag1 into another file, such as:

>randomheader1 some info flag1
data
moredata
someextradata
>randomheader3 some info flag1
one
two
three
four

I've been reading to find a solution, I've checked this answer, however since the matching patterns I'm looking for are the same characters (namely >), it didn't work. I'm looking for a solution in bash.

score 0 · Accepted Answer · answered Jun 25 '20 at 06:55

Using awk

awk '{if($0~/^>/){ if($0~/flag1/) {flag="Y"} else {flag=""}} }flag '

Demo:

$cat temp.txt 
>randomheader1 some info flag1
data
moredata
someextradata
>randomheader2 some info flag2
littledata
somedata
>randomheader3 some info flag1
one
two
three
four
>randomheader4 some info flag3
$awk '{if($0~/^>/){ if($0~/flag1/){flag="Y"} else {flag="" } }}flag ' temp.txt
>randomheader1 some info flag1
data
moredata
someextradata
>randomheader3 some info flag1
one
two
three
four
$

M. Nejat Aydin · Answer 2 · 2020-06-25T19:09:33.770

0

Assuming data file doesn't contain any null character ('\0'), a solution in pure bash could be:

$ cat filter

#!/bin/bash

in_flag=
while IFS= read -r line; do
    case $line in
        \>*\ flag1) in_flag=t ;;
        \>*) in_flag= ;;
    esac
    [[ -n $in_flag ]] && echo "$line"
done

Run it as

./filter < datafile > outfile

edited Jun 25 '20 at 19:09

answered Jun 25 '20 at 18:52

M. Nejat Aydin

9,597
1
7
17

How to print lines between two occurrences of the same character?

2 Answers2