2

I would like to split the following file based on the pattern ABC:

ABC
4
5
6
ABC
1
2
3
ABC
1
2
3
4
ABC
8
2
3

to get file1:

ABC
4
5
6

file2:

ABC
1
2
3

etc.

Looking at the docs of man csplit: csplit my_file /regex/ {num}.

I can split this file using: csplit my_file '/^ABC$/' {2} but this requires me to put in a number for {num}. When I try to match with {*} which suppose to repeat the pattern as much as possible, i get the error:

csplit: *}: bad repetition count

I am using a zshell.

zr0gravity7
  • 2,917
  • 1
  • 12
  • 33
vbfh
  • 115
  • 1
  • 8
  • See the compatibility discussion here: https://stackoverflow.com/a/4323899/12109043 for workarounds. – zr0gravity7 Aug 17 '21 at 19:47
  • @vbfh : My `csplit` does not even have a repetition count parameter. What platform are you on? BTW, what would happen if you just enter a rediculously high rep count (9999 for instance)? – user1934428 Aug 18 '21 at 07:36

1 Answers1

1

To split a file on a pattern like this, I would turn to awk:

awk 'BEGIN { i=0; } 
     /^ABC/ { ++i; } 
     { print >> "file" i }' < input

This reads lines from the file named input; before reading any lines, the BEGIN section explicitly initializes an "i" variable to zero; variables in awk default to zero, but it never hurts to be explicit. The "i" variable is our index to the serial filenames.

Subsequently, each line that starts with "ABC" will increment this "i" variable.

Any and every line in the file will then be printed (in append mode) to the file name that's generated from the text "file" and the current value of the "i" variable.

Jeff Schaller
  • 2,352
  • 5
  • 23
  • 38