I'm aware of How to read large worksheets from large Excel files (27MB+) with PHPExcel? and I've tried to implement the chunked reading that is discussed in that question however I'm still suffering from a OOM errror. The file itself is just under 5Mb, 9000+ rows (yes, it's over 9000!) ranging from A to V.
I'd rather not have the user do any editing on this file before uploading and processing it, as currently it's all a manual process and I'd like to completely replace it with a automated one. The file is of xls format, identified as Excel5 via PHPExcel.
my PHP memory limit is currently set to 128M, running on Ubuntu Server.
No matter what chunk size I set, I end up OOM'ing eventually. With larger chunk sizes it actually runs better (as in I can manage up to around row 7000) if I set the chunk size to 200, when set to 1 it OOM's around row 370. So I believe that 'something' is being stored, or loaded into memory in each iteration of the chunk reading and then not dropped again eventually causing the OOM but I can't see where this is happening.
I'm very much an amateur programmer, this is just something I'm doing on the side in my managed service role at work to try and make our lives easier.
The whole point of this code is to read the excel file, filter out the 'crap' and then save it out as CSV (Right now I'm just dumping it to screen instead of a CSV). At the rate things are going I'm getting tempted to call excel2csv via the php script and then try and clean up the CSV instead... But that feels like giving up when I maybe rather close to a solution.
<?php
error_reporting(E_ALL);
set_time_limit(0);
date_default_timezone_set('Europe/London');
require_once 'Classes/PHPExcel/IOFactory.php';
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
private $_startRow = 0;
private $_endRow = 0;
private $_columns = array();
/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize, $columns) {
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
$this->_columns = $columns;
}
public function readCell($column, $row, $worksheetName = '') {
// Only read the heading row, and the rows that are configured in $this->_startRow$
if ($row >= $this->_startRow && $row < $this->_endRow) {
if(in_array($column,$this->_columns)) {
return true;
}
}
return false;
}
}
$target_dir = "uploads/";
$file_name = $_POST["file_name"];
$full_path = $target_dir . $file_name;
echo "Processing ". $file_name . '; <br>';
ob_flush();
flush();
/** /** As files maybe large in memory, use a temp file to handle them
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array( 'memoryCacheSize' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
**/
$inputFileName = $full_path;
echo 'Excel reader started<br/>';
/** First we should get the type of file **/
$filetype = PHPExcel_IOFactory::identify($inputFileName);
echo 'File of type: ' . $filetype . ' found<br/>';
/** Load $inputFileName to a PHPExcel Object - https://github.com/PHPOffice/PHPExcel/blob/develop/$
/** Define how many rows we want to read for each "chunk" **/
$chunkSize = 1;
/** Create a new Instance of our Read Filter **/
$chunkFilter = new chunkReadFilter();
$objReader = PHPExcel_IOFactory::createReader($filetype);
/** Tell the Reader that we want to use the Read Filter that we've Instantiated **/
$objReader->setReadFilter($chunkFilter);
/** Loop to read our worksheet in "chunk size" blocks **/
for ($startRow = 2; $startRow <= 65000; $startRow += $chunkSize) {
$endRow = $startRow+$chunkSize-1;
echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startR$
/** Tell the Read Filter, the limits on which rows we want to read this iteration **/
$chunkFilter->setRows($startRow,$chunkSize,range('A','T'));
/** Load only the rows that match our filter from $inputFileName to a PHPExcel Object **/
$objPHPExcel = $objReader->load($inputFileName);
// Do some processing here
// $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
$sheetData = $objPHPExcel->getActiveSheet()->rangeToArray("A$startRow:T$endRow");
var_dump($sheetData);
// Clear the variable to not go over memory!
$objPHPExcel->disconnectWorksheets();
unset ($sheetData);
unset ($objPHPExcel);
ob_flush();
flush();
echo '<br /><br />';
}
/** This loads the entire file, crashing with OOM
try {
$objPHPExcel = PHPExcel_IOFactory::load($inputFileName);
echo 'loaded sheet into memory<br>';
} catch(PHPExcel_Reader_Exception $e) {
die('Error loading file: '.$e->getMessage());
}
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');
echo 'Saving sheet as CSV<br>';
$objWriter->setSheetIndex(0);
$objWriter->save('./uploads/'.$file_name.'.csv');
echo 'Processed 1 sheet';
ob_flush();
flush();
**/
echo "<body><table>\n\n";
/**
$f = fopen($file_name, "r");
while (($line = fgetcsv($f)) !== false) {
echo "<tr>";
foreach ($line as $cell) {
echo "<td>" . htmlspecialchars($cell) . "</td>";
}
echo "</tr>\n";
}
fclose($f);
**/
echo "\n</table></body></html>";
?>
The error as indicated in the apache logs is:
[Fri Mar 31 15:35:27.982697 2017] [:error] [pid 1059] [client 10.0.2.2:53866] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 45056 bytes) in /var/www/html/Classes/PHPExcel/Shared/OLERead.php on line 93, referer: http://localhost:8080/upload.php