PHP, Splitting a Large File into parts incorporating a String search

Question

I have a file which is over 400Mb

It is a timetable database which is only distributed in this way.

In this text file there is a string which marks the start of a data record.

This string always begins with "BSN", and likewise there is a string that marks the end of the data record which always starts with "LT"

What i'm trying to fathom is how to chop the data file into chunks, containing 1000 data records. then when this cycle is complete, i can import those files sequentially.

The created files must be numbered sequentially in a new folder...

[edit] the record set varies greatly in length [/edit]

Below is a sample of one of the groups:

BSNC031551112111206240000001   << DATA RECORD START >> 
BX         EMYEM129000                                                           
LOSHEFFLD 2235 2235                                                
LIDORESNJ                                              
LISPDN                                       
LTDRBY    2326 23266           << DATA RECORD END >>                                        
BSNC033501112111205130000001   << NEXT RECORD >>
BX         EMYEM118600

*the << >> tags are added for your understanding, they do not exist in the file.

I currently read in the file using the PHP fopen / fgets method here

Orangepill · Accepted Answer · 2013-05-21T17:19:43.397

1

Something like this should work for you

$fp = fopen($bigfile, "r");

$file_num = 1;
$prefix = "FILE_";
$suffix = ".DAT";
$buff = "";
$recNo = 0;
while ($rec = fgets($fp)){
    if (substr($rec, 0,3) == 'BSN'){
        $recNo++;
    }

    if ($recNo == 1000){
        // reset record counter
        $recNo = 1;
        // flush out the file
        file_put_contents($prefix.$file_num.$suffix, $buff);
        // clear the buffer
        $buff = "";
        // increment the file counter
        $file_num++;
    }
    // add to the buffer
    $buff.= $rec;
}
fclose($fp);

// flush the remainder
if ($buff) file_put_contents($prefix.$file_num.$suffix, $buff);

edited May 21 '13 at 17:19

answered May 21 '13 at 16:51

Orangepill

24,500
3
42
63

"the << >> tags are added for your understanding, they do not exist in the file." – Wesley Schleumer de Góes May 21 '13 at 16:54
I would also have to alter the substring from just "b" to "BSN" as there are two lines that start with B – Deano May 21 '13 at 17:10

Valery Viktorovsky · Answer 2 · 2013-05-21T18:02:34.410

-2

If you have predefined data structure you can use split command (unix):

 split -l 6000 your_big_file.txt data_

This command divides big file to small 6000 strings in each (1000 data records).

Or if data structure is nonuniform you can use perl one liner:

perl -n -e '/^BSNC/ and open FH, ">output_".$n++; print FH;' your_big_file

Perl can parse large files line by line instead of slurping the whole file into memory.

New file will be created for each data record. Don't worry Ext4 file system has a theoretical limit of 4 billion files per directory.

After this it's possible to import all data to database using PHP script.

edited May 21 '13 at 18:02

answered May 21 '13 at 16:27

Valery Viktorovsky

6,487
3
39
47

No, it's not what OP needs. – Wesley Schleumer de Góes May 21 '13 at 16:28
@UltimateProgrammer_BR source file has predefined structure, 6 strings for each data element, so the easiest way to divide file to chunks and then process small files using PHP. – Valery Viktorovsky May 21 '13 at 16:32
Apologies, the record length was not mentioned : it varies greatly. – Deano May 21 '13 at 16:56

PHP, Splitting a Large File into parts incorporating a String search

2 Answers2