Split file by extracting lines between two keywords

Question

I have a file with the following lines:

string
string
string
MODEL 1
.
.
.
TER
string 
string
string
MODEL 2
.
.
.
TER

where there are 5000 such MODELs. I want to split this file such that each section beginning MODEL X and ending TER (shown with dots) is saved to its own file, and everything else is discarded. How can I do this? Possibly with awk or split?

I have checked a couple of other similar questions, but failed to apply the answers to my case.

Also note that I use Mac OS X.

See http://stackoverflow.com/q/23934486/258523 for a discussion of ways to do this in `awk`. — Etan Reisner, Aug 04 '15 at 16:54

anubhava · Accepted Answer · 2015-08-04T17:31:29.890

2

You can use this awk for this:

awk '/^MODEL/{file="model" $2} file{print > file} /^TER/{close(file); file=""}' file

How it works:

/^MODEL/               # match lines starting with MODEL
file="model" $2        # make variable file as model + model_no from column 2
file{...}              # execute of file variable is set
{print>file}           # print each record to file
/^TER/                 # match lines starting with TER
{close(file); file=""} # close file and reset file to ""

Then verify as:

cat model1
MODEL 1
.
.
.
TER

cat model2
MODEL 2
.
.
.
TER

edited Aug 04 '15 at 17:31

answered Aug 04 '15 at 17:05

anubhava

761,203
64
569
643

It works until the 18th file, at which point it terminates with the error: `awk: model18 makes too many open files`. It also prints the `string` lines after `TER`. – sodiumnitrate Aug 04 '15 at 17:11
Thanks for the answer, it works perfectly now. Could you please explain the command a little for me to understand how it works? – sodiumnitrate Aug 04 '15 at 17:20
1

I've added some description in my answer. – anubhava Aug 04 '15 at 17:31

theoden8 · Answer 2 · 2016-04-17T14:08:37.607

This works even with dash:

go=false text= model_ID=
while IFS= read line; do
    if   [ "`printf "$line" | grep '^MODEL'`" ]; then
        model_ID="`printf "$line" | sed -e 's/^MODEL //'`"
        go=true
    elif [ "`printf "$line" | grep '^TER'`" ];   then
        printf "$text" > "MODEL_$model_ID"
        text=""
        model_ID=""
        go=false
    else
        $go && text="$text$line\n"
    fi
done

Split file by extracting lines between two keywords

2 Answers2