I have a sas log file and I want to list only those lines that are between two words: data
and run
.
File can contain many such words in many lines, for example:
MPRINT: data xxxxx;
yyyyy
xxxxxx
MPRINT: run;
fffff
yyyyy
data fff;
fffff
run;
I would like to have lines 1-4 and 8-10.
I tried something like
egrep -iz file -e '\sdata\s+\S*\s+(.|\s)*\srun\s'
but this expression lists all lines between first begin
and last end
((.|\s)
is for the purpose of new line character).
I may also want to add additional words to pattern between data
and run
like:
MPRINT: data xxx;
fffff
NOTE: ffdd
set fff;
xxxxxx
MPRINT: run;
data fff;
yyyyyy
run;
In some cases I would like to list only lines between data
and run
where there is set
word in some line.
I know there are many similar threads, but I didn't find any when keywords can repeat multiple times.
I'm not familiar awk
or sed
but if it can help I can also use it.
[Edit]
Note that data
and run
are not necessarily on the beginning of the line (I updated the example). Also there can't be any other data
between data
and run
.
[Edit2]
As Tom noted every line that I was looking for started with MPRINT(...):
, so filtered those lines.
Anubhava answer helped me the most with my final solution so I mark it as an answer.
Final expression looked like this :
grep -o path -e 'MPRINT.*' | cut -f '2-' -d ' '|
grep -iozP '(?ms) data [^\(;\s]+.*?(set|infile).*?run[^\n]*\n