1

i have a requirement to make report using XLSX file , this report may contains 10.000-1.000.000 rows of trx. i made decision using phpspreadsheet from https://phpspreadsheet.readthedocs.io/en/latest/

The problem is it takes too long to write 10.000 hours of which each record consist of 50 columns. its nearly 24 hours and the script still in running and the progress is 2300/10000, here my codes :

<?php
require 'vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\Spreadsheet;

$client = new \Redis();
$client->connect('192.168.7.147', 6379);
$pool = new \Cache\Adapter\Redis\RedisCachePool($client);
$simpleCache = new \Cache\Bridge\SimpleCache\SimpleCacheBridge($pool);
\PhpOffice\PhpSpreadsheet\Settings::setCache($simpleCache);

$process_time = microtime(true);
if(!file_exists('test.xlsx')) {
    $spreadsheet = new Spreadsheet();
    $writer = new \PhpOffice\PhpSpreadsheet\Writer\Xlsx($spreadsheet);
    $writer->save("test.xlsx");
    unset($writer);
}

for($r=1;$r<=10000;$r++) {
    $reader = new \PhpOffice\PhpSpreadsheet\Reader\Xlsx();
    $spreadsheet = $reader->load("test.xlsx");
    $rowArray=[];
    for($c=1;$c<=50;$c++) {
        $rowArray[]=$r.".Content ".$c;
    }

    $spreadsheet->getActiveSheet()->fromArray(
        $rowArray,
        NULL,
        'A'.$r
    );
    $writer = new \PhpOffice\PhpSpreadsheet\Writer\Xlsx($spreadsheet);
    $writer->save("test.xlsx");
    
    unset($reader);
    unset($writer);
    $spreadsheet->disconnectWorksheets();
    unset($spreadsheet);
}
$process_time = microtime(true) - $process_time;
echo $process_time."\n";

notes :

  1. i propose CSV file but the clients only wants XLSX
  2. without redis cache it gives memory error even only <400 record
  3. im not intended to read .XLSX using php, only write action. looks like the library reads the entire spreadsheet
  4. at example above it takes open-close file every 1 record file, when im doing open->writeall->close its shows memory error at the mid progress
Lezir Opav
  • 674
  • 3
  • 6
  • 21
  • what is your memory limit ? have you tried raising it with ini_set('memory_limit', '500mb'); ? – FatFreddy Nov 29 '22 at 12:18
  • my mem limit 128M, its cannot process 400 record of rows. setting to 500M only get thousand, my goals is writing 1.000.000 record into single xlsx. lmao do i need 128TB ram only to write one single xlsx file? – Lezir Opav Nov 30 '22 at 02:35
  • 2
    https://stackoverflow.com/questions/50731704/building-very-large-spreadsheet-with-phpspreadsheet, maybe try to write as csv without phpspreadsheet and then convert it with phpspreadsheet – FatFreddy Nov 30 '22 at 09:29

1 Answers1

2
  1. at example above it takes open-close file every 1 record file, when im doing open->writeall->close its shows memory error at the mid progress

I see you are opening (createReader) and saving (createWriter) each time when filling content in the loop. It may be the cause of slowing down the process. From your logic, eventually you are writing back the content to the same file, so it can just be open-one-time > write 50x10k records > close-and-save.

A quick test with re-arrange your coding as follows, which result in approximately 25 seconds using my local Xampp in Windows. I'm not sure if this meets your requirement or not, but I think it may consume more time if the content is some long string. My guess is that if you run on a powerful server, the performance might get significant improve in time wise.

$process_time = microtime(true);

$reader = new \PhpOffice\PhpSpreadsheet\Reader\Xlsx();
$spreadsheet = $reader->load($file_loc);
$row_count = 10000;
$col_count = 50;

for ($r = 1; $r <= $row_count; $r++) {
  $rowArray = [];

  for ($c = 1; $c <= $col_count; $c++) {
    $rowArray[] = $r . ".Content " . $c;
  }

  $spreadsheet->getActiveSheet()->fromArray(
    $rowArray,
    NULL,
    'A' . $r
  );
}

$writer = new \PhpOffice\PhpSpreadsheet\Writer\Xlsx($spreadsheet);
$writer->save($target_dir . 'result_' . $file_name);

unset($reader);
unset($writer);
$spreadsheet->disconnectWorksheets();
unset($spreadsheet);

$process_time = microtime(true) - $process_time;
echo $process_time."\n";

Edited:

  1. without redis cache it gives memory error even only <400 record

My quick test is without any cache settings. My guess for the memory issue is that you are opening the XLSX file every time you write the content for each row and then saving it back to the original file.

Every time you open the XLSX file, memory will be loaded and cached with all PhpSpreadsheet object info as well as the [previous content + (50 new columns added each time after saving)] and the memory grows in an exponential way; can you imagine that?

Finally, the memory clearing is way slower and results in memory errors.

1st time open and save 
-> open: none 
-> save: row A, 50 cols

2nd time open and save
-> open: row A, 50 cols 
-> save: row A, 50 cols, row B, 50 cols

3nd time open and save
-> open: row A, 50 cols, row B, 50 cols 
-> save: row A, 50 cols, row B, 50 cols, row C, 50 cols

so on and so forth...
memory might still keeping your previous loaded cache 
and not releasing so fast (no idea how server is handling the memory, Orz) 
and finally memory explode ~ oh no
Zeikman
  • 669
  • 5
  • 10