Bash, grep between two lines with specified string

Question

Example:

a43
test1
abc
cvb
bnm
test2
kfo

I need all lines between test1 and test2. Normal grep does not work in this case. Do you have any propositions?

This answer might also be applicable: https://stackoverflow.com/a/48022994/2026975 — imriss, Dec 29 '17 at 16:55

Jotne · Accepted Answer · 2019-06-27T10:21:12.587

71

Print from test1 to test2 (Trigger lines included)

awk '/test1/{f=1} /test2/{f=0;print} f'
awk '/test1/{f=1} f; /test2/{f=0}' 
awk '/test1/,/test2/'

test1
abc
cvb
bnm
test2

Prints data between test1 to test2 (Trigger lines excluded)

awk '/test1/{f=1;next} /test2/{f=0} f' 
awk '/test2/{f=0} f; /test1/{f=1}'

abc
cvb
bnm

edited Jun 27 '19 at 10:21

answered Mar 06 '14 at 10:50

Jotne

40,548
12
51
55

Hello sir , do you know how can use your same script in shell language – Nov 03 '15 at 15:05
If my input is in a string, how would I use this to process that string? – Mr. Developerdude Mar 30 '16 at 11:26
@LennartRolland See my post here about how to get data inn to awk from variables. http://stackoverflow.com/questions/19075671/how-to-use-shell-variables-in-awk-script/19075707#19075707 – Jotne Apr 28 '16 at 06:18
How would you modify this to include one trigger line, but not the other? i.e., include `test1`, but not `test2` – Eddie Feb 23 '21 at 19:44
@Eddie This should print all from "test1" to end of file `awk '/test1/{f=1} f'` if you like to not print the trigger, then swap it around. `awk 'f; /test1/{f=1}'` – Jotne Mar 05 '21 at 07:22
What if I don't need a precise match? For example, I need to match lines from one word till the beginning of the square bracket `[` which also contains a word, that is not yet known. – t7e Jun 04 '22 at 22:53
@t7e Post a new question with an example data. – Jotne Jun 22 '22 at 10:46
Would be better if you could explain what the commands here mean – polynomial_donut Feb 01 '23 at 18:59

devnull · Answer 2 · 2014-03-06T10:19:58.590

58

You could use sed:

sed -n '/test1/,/test2/p' filename

In order to exclude the lines containing test1 and test2, say:

sed -n '/test1/,/test2/{/test1/b;/test2/b;p}' filename

edited Mar 06 '14 at 10:19

answered Mar 06 '14 at 10:13

devnull

118,548
33
236
227

3

Or, if you are OK with shell expansion in your sed string, you can do `start="test1"; end="test2"; sed -n "/$start/,/$end/{/$start/b;/$end/b;p}" filename`. That way you only have to type each search pattern once. – cp.engr Feb 03 '16 at 22:36
6

To exclude the lines it can be shortened to just `sed -n '/test1/,/test2/{//b;p}'` – 123 Oct 12 '17 at 10:16
What does `//b` mean? I just know `b` means branch, but I don't know about `//` – wisbucky Sep 13 '19 at 02:18
Is it possible to print few lines before 'test1' string and few lines after 'test2' string? What modifications are required? – Rohan Ghige Oct 30 '19 at 06:18
How to print filename also along with the expected output? Can this take multiple file inputs? e.g. sed -n '/test1/,/test2/p' filename1 filename2 – Rohan Ghige Oct 30 '19 at 06:29
Would be better with semantics explained – polynomial_donut Feb 01 '23 at 18:59
Does not work on macos :( – blackjacx Aug 09 '23 at 20:50

score 15 · Answer 3 · answered Mar 06 '14 at 10:18

15

If you can only use grep:

grep -A100000 test1 file.txt | grep -B100000 test2 > new.txt

grep -A and then a number gets the lines after the matching string, and grep -B gets the lines before the matching string. The number, 100000 in this case, has to be large enough to include all lines before and after.

If you don't want to include test1 and test2, then you can remove them afterwards by grep -v, which prints everything except the matching line(s):

egrep -v "test1|test2" new.txt > newer.txt

or everything in one line:

grep -A100000 test1 file.txt | grep -B100000 test2 | egrep -v "test1|test2" > new.txt

answered Mar 06 '14 at 10:18

philshem

24,761
8
61
127

1

I didn't know about `egrep`, interesting, thanks. Details here. http://unix.stackexchange.com/questions/17949/what-is-the-difference-between-grep-egrep-and-fgrep – cp.engr Dec 03 '15 at 18:47
6

just want to point out the obvious that this might fail if test1/test2 pairs occurs more than once in the input. – Mr. Developerdude Sep 20 '16 at 21:22
1

@LennartRolland in which case, it takes the **first** occurrence of test1, and the **first** occurrence of test2 after that, which happened to be exactly what I wanted. :) – Aidin Apr 19 '21 at 22:20

score 8 · Answer 4 · answered Jan 21 '15 at 05:39

8

Yep, normal grep won't do this. But grep with -P parameter will do this job.

$ grep -ozP '(?s)test1\n\K.*?(?=\ntest2)' file
abc
cvb
bnm

\K discards the previously matched characters from printing at the final and the positive lookahead (?=\ntest2) asserts that the match must be followed by a \n newline character and then test2 string.

answered Jan 21 '15 at 05:39

Avinash Raj

172,303
28
230
274

2

Which grep flavor is this? I'm on mac os and there is no -P. What meaning does -P have in the grep context above? It would also help to explain -oz if they are part of the solution. In mac os grep -o is --only-matching and -z is --decompress. The former might be relevant but the latter doesn't seem relevant. – user107172 Aug 23 '17 at 01:19
1

P represents perl regex. ie, we can use perl regex on grep. Unfortunately grep on Mac won't support this option. – Avinash Raj Aug 23 '17 at 02:50

score 1 · Answer 5 · edited Jun 20 '20 at 09:12

The following script wraps up this process. More details in this similar StackOverflow post

get_text.sh

function show_help()
{
  HELP=$(doMain $0 HELP)
  echo "$HELP"
  exit;
}

function doMain()
{
  if [ "$1" == "help" ]
  then
    show_help
  fi
  if [ -z "$1" ]
  then
    show_help
  fi
  if [ -z "$2" ]
  then
    show_help
  fi

  FILENAME=$1
  if [ ! -f $FILENAME ]; then
      echo "File not found: $FILENAME"
      exit;
  fi

  if [ -z "$3" ]
  then
    START_TAG=$2_START
    END_TAG=$2_END
  else
    START_TAG=$2
    END_TAG=$3
  fi

  CMD="cat $FILENAME | awk '/$START_TAG/{f=1;next} /$END_TAG/{f=0} f'"
  eval $CMD
}

function help_txt()
{
HELP_START
  get_text.sh: extracts lines in a file between two tags

  usage: FILENAME {TAG_PREFIX|START_TAG} {END_TAG}

  examples:
    get_text.sh 1.txt AA     => extracts lines in file 1.txt between AA_START and AA_END
    get_text.sh 1.txt AA BB  => extracts lines in file 1.txt between AA and BB
HELP_END
}

doMain $*

I love this answer, small and strong! – Ninja Mar 10 '20 at 12:34 — Ninja, Mar 10 '20 at 12:34
It just encouraged me to trust the `awk` answer! – Aidin Apr 19 '21 at 22:30 — Aidin, Apr 19 '21 at 22:30

score 1 · Answer 6 · answered Jan 06 '20 at 22:57

To make it more deterministic and not having to worry about size of file, use the wc -l and cut the output.

grep -Awc -l test.txt|cut -d" " -f1 test1 test.txt | grep -Bwc -l test.txt|cut -d" " -f1 test2

To make it easier to read, assign it to a variable first.

fsize=wc -l test.txt|cut -d" " -f1; grep -A$fsize test1 test.txt | grep -B$fsize test2

score 0 · Answer 7 · answered Mar 30 '17 at 08:48

0

You can do something like this too. Lets say you this file test.txt with content:

a43
test1
abc
cvb
bnm
test2
kfo

You can do

cat test.txt | grep -A10 test1 | grep -B10 test2

where -A<n> is to get you n lines after your match in the file and -B<n> is to give you n lines before the match. You just have to make sure that n > number of expected lines between test1 and test2. Or you can give it large enough to reach EOF.

Result:

test1
abc
cvb
bnm
test2

answered Mar 30 '17 at 08:48

pratpor

1,954
1
27
46

this is cool if you know number of lines, but is not so in real world when you need to get all lines between two markers. For example a list of files are packed into a tar, but you do not know which ones and you want full list. Then your solution will not work. Jotne's solution will work perfectly. – aprodan Apr 11 '17 at 01:48

score 0 · Answer 8 · answered Apr 18 '17 at 18:38

The answer by PratPor above:

cat test.txt | grep -A10 test1 | grep -B10 test2

is cool.. but if you don't know the file length:

cat test.txt | grep -A1000 test1 | grep -B1000 test2

Not deterministic, but not too bad. Anyone have better (more deterministic)?

Bash, grep between two lines with specified string

8 Answers8

get_text.sh

Linked

Related