2

I'm aware of How to read large worksheets from large Excel files (27MB+) with PHPExcel? and I've tried to implement the chunked reading that is discussed in that question however I'm still suffering from a OOM errror. The file itself is just under 5Mb, 9000+ rows (yes, it's over 9000!) ranging from A to V.

I'd rather not have the user do any editing on this file before uploading and processing it, as currently it's all a manual process and I'd like to completely replace it with a automated one. The file is of xls format, identified as Excel5 via PHPExcel.

my PHP memory limit is currently set to 128M, running on Ubuntu Server.

No matter what chunk size I set, I end up OOM'ing eventually. With larger chunk sizes it actually runs better (as in I can manage up to around row 7000) if I set the chunk size to 200, when set to 1 it OOM's around row 370. So I believe that 'something' is being stored, or loaded into memory in each iteration of the chunk reading and then not dropped again eventually causing the OOM but I can't see where this is happening.

I'm very much an amateur programmer, this is just something I'm doing on the side in my managed service role at work to try and make our lives easier.

The whole point of this code is to read the excel file, filter out the 'crap' and then save it out as CSV (Right now I'm just dumping it to screen instead of a CSV). At the rate things are going I'm getting tempted to call excel2csv via the php script and then try and clean up the CSV instead... But that feels like giving up when I maybe rather close to a solution.

<?php

error_reporting(E_ALL);
set_time_limit(0);
date_default_timezone_set('Europe/London');

require_once 'Classes/PHPExcel/IOFactory.php';

class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
        private $_startRow = 0;
        private $_endRow = 0;
        private $_columns = array();

        /**  Set the list of rows that we want to read  */
        public function setRows($startRow, $chunkSize, $columns) {
                $this->_startRow        = $startRow;
                $this->_endRow          = $startRow + $chunkSize;
                $this->_columns         = $columns;
        }
        public function readCell($column, $row, $worksheetName = '') {
                //  Only read the heading row, and the rows that are configured in $this->_startRow$
                if ($row >= $this->_startRow && $row < $this->_endRow) {
                        if(in_array($column,$this->_columns)) {
                                return true;
                        }
                }
                return false;
        }
}
$target_dir = "uploads/";
$file_name = $_POST["file_name"];

$full_path = $target_dir . $file_name;

echo "Processing ". $file_name . '; <br>';

ob_flush();
flush();


/** /** As files maybe large in memory, use a temp file to handle them
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array( 'memoryCacheSize' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
**/

$inputFileName = $full_path;

echo 'Excel reader started<br/>';

/** First we should get the type of file **/

$filetype = PHPExcel_IOFactory::identify($inputFileName);

echo 'File of type: ' . $filetype . ' found<br/>';

/** Load $inputFileName to a PHPExcel Object  - https://github.com/PHPOffice/PHPExcel/blob/develop/$


/**  Define how many rows we want to read for each "chunk"  **/
$chunkSize = 1;
/**  Create a new Instance of our Read Filter  **/
$chunkFilter = new chunkReadFilter();

$objReader = PHPExcel_IOFactory::createReader($filetype);

/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/
$objReader->setReadFilter($chunkFilter);
/**  Loop to read our worksheet in "chunk size" blocks  **/
for ($startRow = 2; $startRow <= 65000; $startRow += $chunkSize) {
        $endRow = $startRow+$chunkSize-1;
        echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startR$
        /**  Tell the Read Filter, the limits on which rows we want to read this iteration  **/
        $chunkFilter->setRows($startRow,$chunkSize,range('A','T'));
        /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  **/
        $objPHPExcel = $objReader->load($inputFileName);
        //      Do some processing here
//      $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
        $sheetData = $objPHPExcel->getActiveSheet()->rangeToArray("A$startRow:T$endRow");
        var_dump($sheetData);
        // Clear the variable to not go over memory!
        $objPHPExcel->disconnectWorksheets();
        unset ($sheetData);
        unset ($objPHPExcel);
        ob_flush();
        flush();

        echo '<br /><br />';
}


/**  This loads the entire file,  crashing with OOM

try {
        $objPHPExcel = PHPExcel_IOFactory::load($inputFileName);
        echo 'loaded sheet into memory<br>';
} catch(PHPExcel_Reader_Exception $e) {
    die('Error loading file: '.$e->getMessage());
}

$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');

echo 'Saving sheet as CSV<br>';

    $objWriter->setSheetIndex(0);
    $objWriter->save('./uploads/'.$file_name.'.csv');
    echo 'Processed 1 sheet';
    ob_flush();
flush();

**/

echo "<body><table>\n\n";


/**
$f = fopen($file_name, "r");
while (($line = fgetcsv($f)) !== false) {
        echo "<tr>";
        foreach ($line as $cell) {
                echo "<td>" . htmlspecialchars($cell) . "</td>";
        }
        echo "</tr>\n";
}
fclose($f);
**/

echo "\n</table></body></html>";

?>

The error as indicated in the apache logs is:

[Fri Mar 31 15:35:27.982697 2017] [:error] [pid 1059] [client 10.0.2.2:53866] PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 45056 bytes) in /var/www/html/Classes/PHPExcel/Shared/OLERead.php on line 93, referer: http://localhost:8080/upload.php
Community
  • 1
  • 1

1 Answers1

1
unset ($objPHPExcel);

If you check the PHPExcel documentation, this won't cleanly unset $objPHPExcel because of cyclic references between the spreadsheet, worksheet and cells, and will result in memory leaks. The recommendation is to disconnect these cyclic references first.

$objPHPExcel->disconnectWorksheets();
unset($objPHPExcel);

There will still be some memory leakage, but it should allow more memory to be freed up between chunks

Mark Baker
  • 209,507
  • 32
  • 346
  • 385
  • Ah crap. I've already added this, but didn't update the pastebin I copied the code over from!. – djsmiley2kStaysInside Mar 31 '17 at 14:26
  • You might also consider using [cell caching](https://github.com/PHPOffice/PHPExcel/blob/develop/Documentation/markdown/Overview/04-Configuration-Settings.md#cell-caching) – Mark Baker Mar 31 '17 at 14:28
  • As you can see from the commented code, I did try this earlier but failed. I've just tried re-enabling it with the 8Mb array before it pushes it out to a file.... it's still OOM'ing - this is why I'm thinking the leak is in PHPExcel somewhere? – djsmiley2kStaysInside Mar 31 '17 at 14:34
  • 1
    Yes PHPExcel is imperfect and heavy on memory; but there's only so much it's possible to do to reduce the memory required by an in-memory representation of a spreadsheet... and it may be that you'll have to actually increase your php memory limit – Mark Baker Mar 31 '17 at 14:48
  • Thanks @Mark I think I'm going to have to move away from the idea of using it at all. It simply can't cope with this sheet. I've just setup caching using sqlite3 and it still just blows up on the sheets. – djsmiley2kStaysInside Mar 31 '17 at 15:06