Split file after n number of non consecutivempty lines

Question

I am trying to split a big text files after n number of empty lines. The text file contains exactly one empty line as data separator. Like below:

Lorem ipsum
Lorem ipsum
Lorem ipsum

Lorem ipsum
Lorem ipsum

Lorem ipsum

Lorem ipsum
Lorem ipsum

Lorem
Lorem

...

I have tried to use csplit

csplit data.txt /^$/ {3}

My expectation is that after 3 empty lines (not consecutive, but after cursor processes 3 empty lines) it split file and continue to do so. But it actualy splits file in each empty line.

My expected files: xx00

Lorem ipsum
Lorem ipsum
Lorem ipsum

Lorem ipsum
Lorem ipsum

Lorem ipsum

xx01

Lorem ipsum
Lorem ipsum

Lorem
Lorem

Any suggestion?

The problem you are having is a Regex applies to a LINE of data, not multiple lines. So the repetition `{3}` doesn't do what you want it to do. Another option is `awk` (or a bash script -- awk will be faster). In either case there you have the ability to use internal variables to keep count of the empty lines encountered. — David C. Rankin, Jun 08 '22 at 07:15
_not consecutive, but after cursor processes 3 empty lines_ But is it possible that there are consecutive empty lines? — James Brown, Jun 08 '22 at 07:19
Also, the output you show is inconsistent with a split at the 3rd newline. In that case `xx00` should not have the final 2 lines you show. `xx00` shows splitting the line on the 4th newline, which would remove the first two lines in `xx01`. — David C. Rankin, Jun 08 '22 at 07:21

Renaud Pacalet · Accepted Answer · 2022-06-08T07:28:55.053

2

With awk (tested with GNU and BSD awk):

awk -v max=3 '{print > sprintf("xx%02d", int(n/max))} /^$/ {n += 1}' file

edited Jun 08 '22 at 07:28

answered Jun 08 '22 at 07:15

Renaud Pacalet

25,260
3
34
51

score 2 · Answer 2 · answered Jun 08 '22 at 08:21

2

This awk should also work with an empty RS:

awk -v n=3 -v RS= '{ORS=RT; print > sprintf("xx%02d", int((NR-1)/n))}' file

answered Jun 08 '22 at 08:21

anubhava

761,203
64
569
643

score 0 · Answer 3 · answered Jun 08 '22 at 08:28

awk is good for this.

Split every n empty lines, naming files with:

No leading zeroes:

awk -v n=3 '
$0 == "" {++c}
c <= n {print > "xx"f}
c==n {c=0; ++f}'

width minimum width/zeroes:

awk -v n=3 -v width=2 '
$0 == "" {++c}
c <= n {print > "xx"f}
c==n {c=0; ++f; f = sprintf("%0*d",width,f)}'

To remove the trailing empty line in each file, just change c <= n to c < n.

RARE Kpop Manifesto · Answer 4 · 2022-06-08T13:33:07.250

removed './xx00'
removed './xx01'
removed './awkprof.out'

    {m,g}awk '{
        print >> sprintf("xx%0*.f%.*s", __-(_~_),
                 int(_/__),_<_,_+=!NF) }' FS='^$' __=3

-rw-r--r--  1 501  75 Jun  8 09:19:10 2022 xx00
-rw-r--r--  1 501  37 Jun  8 09:19:10 2022 xx01


../../Desktop/testdiremptylines/

     1  Lorem ipsum
     2  Lorem ipsum
     3  Lorem ipsum
     4  
     5  Lorem ipsum
     6  Lorem ipsum
     7  
     8  Lorem ipsum
     9  

 xx00

     1  Lorem ipsum
     2  Lorem ipsum
     3  
     4  Lorem
     5  Lorem

 xx01

Split file after n number of non consecutivempty lines

4 Answers4