How to use csplit to split a file based on every X amount of delimiter matches

Question

I have a 457 MB file and trying to split it down into much smaller file. Here's what's currently working:

csplit -z Scan.nessus /\<ReportHost/ '{*}'

However, this creates about 61.5k for me as I have a ton of of these entries in this 457MB file. Ultimately, I'd like to perhaps break this down by every 50 entries rather than every single entry.

Is there a way to modify this to accomplish that? I tried doing this in Ruby to some extent, but it seems to max out the VM's memory trying to parse through the file with Nokogiri.

score 1 · Answer 1 · answered Dec 10 '21 at 06:19

How about a perl solution?
Even if you are not familiar with the syntax of perl, it will not be difficult to customize it modifying the parameters defined as my $pattern = ..., etc.

#!/bin/bash

perl -e '
    use strict; use warnings;

    my $pattern = "<ReportHost";        # the pattern to split
    my $prefix = "xx";                  # prefix of the output file
    my $n = 50;                         # number of entries per file

    my $filename = $prefix . "000";
    my $count = 0;

    while (<>) {
        if (/$pattern/o) {              # if the pattern is found
            if ($count % $n == 0) {     # open the new file to output
                open(FH, "> $filename") or die "$filename";
                $filename++;            # increment the number of the file
            }
            $count++;
        }
        print FH;                       # print the line to the opened file
    }
' Scan.nessus                           # input filename

How to use csplit to split a file based on every X amount of delimiter matches

1 Answers1