0

I have a file with around 100 records for now. The file has users in json format per line. Eg

{"user_id" : 1,"user_name": "Alex"}
{"user_id" : 2,"user_name": "Bob"}
{"user_id" : 3,"user_name": "Mark"}

Note : This is a just very simple example, I have more complex json values per line in the file.

I am reading the file line by line and store that in an array which obviously will be big if there are a lot of items in the file.

public function read(string $file) : array
        {
            //Open the file in "reading only" mode.
            $fileHandle = fopen($file, "r");
    
            //If we failed to get a file handle, throw an Exception.
            if ($fileHandle === false) {
                throw new Exception('Could not get file handle for: ' . $file);
            }
    
            $lines = [];
            //While we haven't reach the end of the file.
            while (!feof($fileHandle)) {
                //Read the current line in.
                $lines[] = json_decode(fgets($fileHandle));
            }
    
            //Finally, close the file handle.
            fclose($fileHandle);
            return $lines;
        }

Next, Ill process this array and only take the parameters I need (some parameters might be further processed) and then Ill export this array to csv.

public function processInput($users){
$data = [];
foreach ($users as $key => $user)
{
   $data[$key]['user_id'] = $user->user_id;
   $data[$key]['user_name'] = strtoupper($user->user_name);
}
// Call export to csv $data.
}

What should be the best way to read the file (incase we have a big file)?

I know file_get_contents is not optimized way and instead fgets is a better approach.

Is there a much better way considering big file read and then put it to csv.

user3286692
  • 383
  • 1
  • 5
  • 23
  • _"Next, Ill process this array and only take the parameters I need"_ - why are you not doing that already at the point where you fill the array? – CBroe Dec 06 '21 at 08:47
  • How is the `$lines` array connected to the `$users` array? – apokryfos Dec 06 '21 at 08:47
  • 1
    For large amounts of data do not use JSON. Use a database. PHP has support for [SQLite](https://www.php.net/manual/en/ref.pdo-sqlite), the database is stored in a file, it does not require an external server and you can use SQL to manipulate the data. – axiac Dec 06 '21 at 08:48
  • I have the reader as a service so that other code can reuse it and thats why I am returning it as an array. – user3286692 Dec 06 '21 at 08:49
  • 2
    @axiac I am getting the results via a 3rd party as a file. – user3286692 Dec 06 '21 at 08:50
  • @apokryfos They are the same for this example, the code returns $lines which is then passed as $users. – user3286692 Dec 06 '21 at 08:51
  • 1
    Beware premature optimisation. Your `read()` function reads the entire file line by line. If your file really is too large then this will fail. For a large file read **and process** line by line with `fgets()`. If that's too slow read large chunks of the file and handle the line breaks yourself, but at each step be sure that the extra code complexity is justified by the required performance. If you find that you need to optimise, profile your code and focus on the slowest operations. – Tangentially Perpendicular Dec 06 '21 at 09:05
  • FYI, 1000 lines of your sample record (~40KB), when read and parsed from JSON, takes 1.42MB of memory when handled as one big chunk. You are still light-years away from having to deal with excess memory usage. – Markus AO Dec 06 '21 at 10:59

1 Answers1

0

You need to modify your reader to make it more "lazy" in some sense. For example consider this:

public function read(string $file, callable $rowProcessor) : void
{
  //Open the file in "reading only" mode.
  $fileHandle = fopen($file, "r");
    
  //If we failed to get a file handle, throw an Exception.
  if ($fileHandle === false) {
     throw new Exception('Could not get file handle for: ' . $file);
  }
    
   //While we haven't reach the end of the file.
   while (!feof($fileHandle)) {
   //Read the current line in.
       $line = json_decode(fgets($fileHandle));
       $rowProcessor($line);
   }
    
    //Finally, close the file handle.
    fclose($fileHandle);
    return $lines;
}

Then your will need different code that works with this:

function processAndWriteJson($filename) { //Names are hard
   $writer = fopen('output.csv', 'w');
   read($filename, function ($row) use ($writer) {
       // Do processing of the single row here 
       fputcsv($writer, $processedRow);
   });   
}

If you want to get the same result as before with your read method you can do:

$lines = [];
read($filename, function ($row) use ($writer) {
   $lines[] = $row;
});  

It does provide some more flexibility. Unfortunately it does mean you can only process one line at a time and scanning up and down the file is harder

apokryfos
  • 38,771
  • 9
  • 70
  • 114