0

I have a file with format like :

[PATTERN]
line1
line2
line3
.
.
.
line
[PATTERN]
line1
line2
line3
.
.
.
line
[PATTERN]
line1
line2
line3
.
.
.
line

I want to extract the following blocks from above file :

[PATTERN]
line1
line2
line3
.
.
.
line

Note: Number of lines between 2 [PATTERN] may varies, so can't rely on number of lines.

Basically, I want to store each pattern and the lines following it to Database, so I wil have to iterate all such blocks in my file.

How do this with Shell Scripting ?

Yugal Jindle
  • 44,057
  • 43
  • 129
  • 197
  • this is really ambiguous. Which one do you want to extract? – Karoly Horvath Aug 18 '11 at 08:52
  • Please, ask your query.. I know it seems ambiguous but, its hard to explain. – Yugal Jindle Aug 18 '11 at 08:53
  • Duplicate of [How to extract from a file text between tokens using bash scripts](http://stackoverflow.com/questions/4860228/how-to-extract-from-a-file-text-between-tokens-using-bash-scripts), or [Extract text from between 2 tokens in a text file using bash](http://stackoverflow.com/questions/4857424/extract-text-from-between-2-tokens-in-a-text-file-using-bash) perhaps? – aioobe Aug 18 '11 at 08:53

2 Answers2

1

This assumes you are using bash as your shell. For other shells, the actual solution can be different.

Assuming your data is in data:

i=0 ; cat data  | while read line ; do \
  if [ "$line" == "[PATTERN]" ] ; then \
    i=$(($i + 1)) ; touch file.$i ; continue ; \
  fi ; echo "$line" >> file.$i ; \
done

Change [PATTERN] by your actual separation pattern.

This will create files file.1, file.2, etc.

Edit: responding to request about an awk solution:

awk '/^\[PATTERN\]$/{close("file"f);f++;next}{print $0 > "file"f}' data

The idea is to open a new file each time the [PATTERN] is found (skipping that line - next command), and writing all successive lines to that file. If you need to include [PATTERN] in your generated files, delete the next command.

Notice the escaping of the [ and ], which have special meaning for regular expressions. If your pattern does not contain those, you do not need the escaping. The ^ and $ are advisable, since they tie your pattern to the beginning and end of line, which you will usually need.

blueFast
  • 41,341
  • 63
  • 198
  • 344
0

This can be for sure improved, but if you want to store lines in an array here is something I did in past:

#!/bin/bash
file=$1
gp_cnt=-1
i=-1

while read line
do
  # Match pattern
  if [[ "$line" == "[PATTERN]" ]]; then
    let "gp_cnt +=1"
    # If this is not the first match process group
    if [[ $gp_cnt -gt 0 ]]; then
      # Process the group
      echo "Processing group #`expr $gp_cnt - 1`"
      echo ${parsed[*]}
    fi
    # Start new group
    echo "Pattern #$gp_cnt catched"
    i=0
    unset parsed
    parsed[$i]="$line"

    # Other lines (lines before first pattern are not processed)
  elif [[ $gp_cnt != -1 ]]; then
    let "i +=1"
    parsed[$i]="$line"
  fi
done < <(cat $file)

# Process last group
echo "Processing group #$gp_cnt"
echo ${parsed[*]}

I don't like the processing of the last group out of the loop...

Plouff
  • 3,290
  • 2
  • 27
  • 45