2

i have a file that i need split into multiple files, and need it done via separate start and end delimiters.

for example, if i have the following file:

abcdef
START
ghijklm
nopqrst
END
uvwxyz
START
abcdef
ghijklm
nopqrs
END
START
tuvwxyz
END

i need 3 separate files of:

file1

START
ghijklm
nopqrst
END

file2

START
abcdef
ghijklm
nopqrs
END

file3

START
tuvwxyz
END

i found this link which showed how to do it with a starting delimiter, but i also need an ending delimiter. i have tried this using some regex in the awk command, but am not getting the result that i want. i don't quite understand how to get awk to be 'lazy' or 'non greedy', so that i can get it to pull apart the file correctly.

i really like the awk solution. something similar would be fantastic (i am reposting the solution here so you don't have to click through:

awk '/DELIMITER_HERE/{n++}{print >"out" n ".txt" }' input_file.txt

any help is appreciated.

Community
  • 1
  • 1
jasonmclose
  • 1,667
  • 4
  • 22
  • 38

3 Answers3

4

You can use this awk command:

awk '/^START/{n++;w=1} n&&w{print >"out" n ".txt"} /^END/{w=0}' input_file.txt
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • i like this solution the best. it worked perfectly for me, and i only had to type in the delimiters one time for the command (the delimiters are much, much longer than my START and END delimiters used in the example). Thank you. – jasonmclose Jan 27 '14 at 18:39
  • 1
    using n&&w is an amazing trick and it did all the job. Nice One. – dev Aug 28 '18 at 20:08
4
awk '
    /START/ {p = 1; n++; file = "file" n}
    p { print > file }
    /END/ {p = 0}
' filename
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
1

Here's another example using range notation:

awk '/START/,/END/ {if(/START/) n++; print > "out" n ".txt"}' data

Or an equivalent with a different if/else syntax:

awk '/START/,/END/ {print > "out" (/START/ ? ++n : n) ".txt"}' data

Here's a version without repeating the /START/ regex after Ed Morton's comments because I just wanted to see if it would work:

awk '/START/ && ++n,/END/ {print > "out" n ".txt" }' data

The other answers are definitely better if your range is or will ever be non-inclusive of the ends.

n0741337
  • 2,474
  • 2
  • 15
  • 15
  • never use range notation - it makes the trivial stuff slightly briefer but then requires a complete re-write and/or duplication of conditions (as in this case) when things get even slightly more complicated. – Ed Morton Jan 27 '14 at 17:59