1

I'm a beginner level developer learning php.The task that i need to do is upload a 6gb CSV file which contains data, into the data base.I need to access the data i.e reading the file through controller.php file and then splitting that huge CSV file into 10,000 row output CSV files and writing data into those output CSV files. I have been through this task a week already and dint figure it out yet.Would you guys please help me in solving this issue.

<?php

namespace App\Http\Controllers;
use Illuminate\Queue\SerializesModels;

use App\User;
use DateTime;
use Illuminate\Http\Request;
use Storage;
use Validator;
use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;
use Queue;
use App\model;


class Name extends Controller
{


     public function Post(Request $request)
     {

         if($request->hasfile('upload')){
            ini_set('auto_detect_line_endings', TRUE);
                $main_input = $request->file('upload');
                $main_output = 'output';
                $filesize = 10000;
                $input = fopen($main_input,'r');
                $rowcount = 0;
                $filecount = 1;
                $output = '';

                // echo "here1";
                while(!feof($input)){
                    if(($rowcount % $filesize) == 0){
                        if($rowcount>0) { 
                            fclose($output);
                        }
                    $output = fopen(storage_path(). "/tmp/".$main_output.$filecount++ . '.csv','w');
                    }
                    $data = fgetcsv($input);
                    print_r($data);

                    if($data) {

                        fputcsv($output, $data);
                    }

                    $rowcount++;
                }
                fclose($output);
        }
     }
}  
Hemanth
  • 33
  • 1
  • 8
  • Your upload files will always be 6gb large? or is this an stand-alone/unique case to innitialize the database? – Kenny Horna Oct 09 '18 at 14:05
  • Not all the files are this big. In this task yes the file is a 6gb CSV file and yes its unique case. – Hemanth Oct 09 '18 at 14:08
  • So maybe it would be better to upload the csv file directly to your database (if you are using MySQL you can do it visually with PHPMyAdmin) and then structure your project creating Models for your tables and so on. – Kenny Horna Oct 09 '18 at 14:19
  • If its about only one file, couldn't you do this manually? – B001ᛦ Oct 09 '18 at 14:20
  • I dint get you @HCK. Would you brief it out a little bit. Also the code that i posted above doesn't work. Could you check where i'm making a mistake – Hemanth Oct 09 '18 at 14:23
  • @B001ᛦ No, my friend. – Hemanth Oct 09 '18 at 14:26
  • @Hemanth as I said, this will help you in case you use this client to manage your databases in a more visual way (PHPMyAdmin) here is the link to how to upload your file: [How to import or export csv to the database](https://docs.phpmyadmin.net/es/latest/import_export.html) – Kenny Horna Oct 09 '18 at 14:27
  • @HCK, yea it helps but the thing is i have to code it to chunks. – Hemanth Oct 09 '18 at 14:36
  • Do you get any errors? Have you tried it already with a much smaller file? – Martin Cup Oct 09 '18 at 14:53
  • @MartinCup, yes martin i have tried the code with a smaller test file. The out put file is being generated with headers and data in a single row. The data isn't being sorted into the respective headers. The data is placed along the headers in the same row1. – Hemanth Oct 09 '18 at 14:56
  • [link]https://stackoverflow.com/questions/16732590/split-large-excel-csv-file-to-multiple-files-on-php-or-javascript My task is similar in comparison to this task. – Hemanth Oct 09 '18 at 15:05
  • Why are you opening the file with 'php://output'? What happens, when you take away this parameter? – Martin Cup Oct 09 '18 at 15:26
  • Its a write-only stream that allows you to write to the output buffer mechanism in the same way as print and echo. Excluding that would also give me the same error. – Hemanth Oct 09 '18 at 15:30

2 Answers2

1

Maybe it's because you are creating a new $output file handler for each iteration.

I've made some adjustments, so that we only create a file when the rowCount = 0 and close it when the fileSize is reached. Also the rowCount has to be reset to 0 each time we close the file.

public function Post(Request $request)
     {

         if($request->hasfile('upload')){
            ini_set('auto_detect_line_endings', TRUE);
                $main_input = $request->file('upload');
                $main_output = 'output';
                $filesize = 10000;
                $input = fopen($main_input,'r');
                $rowcount = 0;
                $filecount = 1;
                $output = '';

                // echo "here1";
                while(!feof($input)){
                    if ($rowCount == 0) {
                        $output = fopen('php://output', storage_path(). "/tmp/".$main_output.$filecount++ . '.csv','w');
                    }
                    if(($rowcount % $filesize) == 0){
                        if($rowcount>0) { 
                            fclose($output);
                            $rowCount = 0;
                            continue;
                        }

                    }
                    $data = fgetcsv($input);
                    print_r($data);

                    if($data) {

                        fputcsv($output, $data);
                    }

                    $rowcount++;
                }
                fclose($output);
        }
     }
xtothea
  • 169
  • 1
  • 6
  • thank you so much, this worked out perfectly for me. – Hemanth Oct 09 '18 at 17:00
  • Hey, xtothea. The code is working fine for sample data files and also at the same time i'm finding a "413 request entity too large" error while uploading the big file say the 6gb csv file. Could you help me out – Hemanth Oct 09 '18 at 17:33
  • Glad i could help. The 413 error is another topic. Probably a webserver limitation. Check out https://stackoverflow.com/questions/24306335/413-request-entity-too-large-file-upload-issue – xtothea Oct 09 '18 at 18:39
  • Hey Xtothea, i made changes to php.ini file, changed the memory_limit, file_upload_size and other required changes. Restarted the web server. I was able to generate just one output file with 10000 rows and after that it returns the same error 413 Request Entity too large. Could you help me where i'm doing wrong. – Hemanth Oct 10 '18 at 16:00
0

Here is working example of splitting CSV file by the amount of lines (defined by$numberOfLines). Just set your path in $filePath and run the script in shell for example:

php -f convert.php

script code: convert.php

<?php

$filePath = 'data.csv';
$numberOfLines = 10000;

$file = new SplFileObject($filePath);

//get header of the csv
$header = $file->fgets();

$outputBuffer = '';
$outputFileNamePrefix = 'datasplit-';

$readLinesCount = 1;
$readlLinesTotalCount = 1;
$suffix=0;

$outputBuffer .= $header;

while ($currentLine = $file->fgets()) {
    $outputBuffer .= $currentLine;
    $readLinesCount++;
    $readlLinesTotalCount++;

    if ($readLinesCount >= $numberOfLines) {
        $outputFilename = $outputFileNamePrefix . $suffix . '.csv';
        file_put_contents($outputFilename, $outputBuffer);
        echo 'Wrote '  . $readLinesCount . ' lines to: ' . $outputFilename . PHP_EOL;    

        $outputBuffer = $header;
        $readLinesCount = 0;
        $suffix++;
    }
}

//write remainings of output buffer if it is not empty
if ($outputBuffer !== $header) {
    $outputFilename = $outputFileNamePrefix . $suffix . '.csv';
    file_put_contents($outputFilename, $outputBuffer);
    echo 'Wrote (last time)'  . $readLinesCount . ' lines to: ' . $outputFilename . PHP_EOL;

    $outputBuffer = '';
    $readLinesCount = 0;

}

you will not be able to convert such amount of data in one php execution if it is run form web because of the maximum execution time of php scripts that is usually between 30-60sec and there is a reason for that - don't event try to extend it to some huge number. If you want your script to run even for hours you need to call it from command line, but you also can call it similar way from another script (for example the controller you have) You do that this way:

exec('php -f convert.php');

and that's it.

The controller you have will not be able to tell if the whole data was converted because before that happens it will be terminated. What you can do is to write your own code in convert.php that updates some field in database and other controller in your application can read that and print to the user the progress of the runnig convert.php.

The other approach is to crate job/jobs that you can put in the queue and can be run by job manager process with workers that can take care for the conversion but I think that would be an overkill for your need.

Keep in mind that if you split something and on different location join you may have problem of getting something wrong in that process the method that would assure you that you split, transferred, joined your data successfully is to calculate HASH ie SHA-1 of the whole 6GB file before split, send that HASH to destination where all small parts of data needs to be combined, combine them into one 6GB file, calculate HASH of that file and compare with the one that was send. Keep in mind that each of small parts of your data after splitting has their own header to be CSV file easy to interpret (import), where in the original file you have only one header row.

Jimmix
  • 5,644
  • 6
  • 44
  • 71