How to extract the lines between patterns?

Question

I have a file with format like :

[PATTERN]
line1
line2
line3
.
.
.
line
[PATTERN]
line1
line2
line3
.
.
.
line
[PATTERN]
line1
line2
line3
.
.
.
line

I want to extract the following blocks from above file :

[PATTERN]
line1
line2
line3
.
.
.
line

Note: Number of lines between 2 [PATTERN] may varies, so can't rely on number of lines.

Basically, I want to store each pattern and the lines following it to Database, so I wil have to iterate all such blocks in my file.

How do this with Shell Scripting ?

Please, ask your query.. I know it seems ambiguous but, its hard to explain. — Yugal Jindle, Aug 18 '11 at 08:53
Duplicate of [How to extract from a file text between tokens using bash scripts](http://stackoverflow.com/questions/4860228/how-to-extract-from-a-file-text-between-tokens-using-bash-scripts), or [Extract text from between 2 tokens in a text file using bash](http://stackoverflow.com/questions/4857424/extract-text-from-between-2-tokens-in-a-text-file-using-bash) perhaps? — aioobe, Aug 18 '11 at 08:53

blueFast · Accepted Answer · 2011-08-18T10:35:58.863

This assumes you are using bash as your shell. For other shells, the actual solution can be different.

Assuming your data is in data:

i=0 ; cat data  | while read line ; do \
  if [ "$line" == "[PATTERN]" ] ; then \
    i=$(($i + 1)) ; touch file.$i ; continue ; \
  fi ; echo "$line" >> file.$i ; \
done

Change [PATTERN] by your actual separation pattern.

This will create files file.1, file.2, etc.

Edit: responding to request about an awk solution:

awk '/^\[PATTERN\]$/{close("file"f);f++;next}{print $0 > "file"f}' data

The idea is to open a new file each time the [PATTERN] is found (skipping that line - next command), and writing all successive lines to that file. If you need to include [PATTERN] in your generated files, delete the next command.

Notice the escaping of the [ and ], which have special meaning for regular expressions. If your pattern does not contain those, you do not need the escaping. The ^ and $ are advisable, since they tie your pattern to the beginning and end of line, which you will usually need.

Solution is right.. but can we do that directly with something like sed or awk or grep or something ? — Yugal Jindle, Aug 18 '11 at 09:48

score 0 · Answer 2 · answered Aug 18 '11 at 09:45

This can be for sure improved, but if you want to store lines in an array here is something I did in past:

#!/bin/bash
file=$1
gp_cnt=-1
i=-1

while read line
do
  # Match pattern
  if [[ "$line" == "[PATTERN]" ]]; then
    let "gp_cnt +=1"
    # If this is not the first match process group
    if [[ $gp_cnt -gt 0 ]]; then
      # Process the group
      echo "Processing group #`expr $gp_cnt - 1`"
      echo ${parsed[*]}
    fi
    # Start new group
    echo "Pattern #$gp_cnt catched"
    i=0
    unset parsed
    parsed[$i]="$line"

    # Other lines (lines before first pattern are not processed)
  elif [[ $gp_cnt != -1 ]]; then
    let "i +=1"
    parsed[$i]="$line"
  fi
done < <(cat $file)

# Process last group
echo "Processing group #$gp_cnt"
echo ${parsed[*]}

I don't like the processing of the last group out of the loop...

How to extract the lines between patterns?

2 Answers2