8

I'm having a memory issue with PHPExcel when processing an XLS file. I have to work with quite big files (between 50k and 200k rows, and 9-10 cols) so I had to use ReadFilters to solve the memory issue.

However, although it worked pretty well with XLSX files, using a mix of background processes and some simple calculations for the chunk size, I fail to make it work with XLS files.

This is the piece of code where everything explodes:

Class ExcelReadFilter implements PHPExcel_Reader_IReadFilter
{
    private $startRow = 0;
    private $endRow = 0;

    public function setRows($startRow, $chunkSize) {
        $this->startRow    = $startRow;
        $this->endRow        = $startRow + $chunkSize;
    }

    public function readCell($column, $row, $worksheetName = '') {
        if ( ($row >= $this->startRow && $row < $this->endRow) ) {
            return true;
        }
        return false;
    }
}

PHPExcel_Settings::setCacheStorageMethod( PHPExcel_CachedObjectStorageFactory::cache_in_memory_serialized );

....
$filter = new ExcelReadFilter();
$filter->setRows($desde, $cuantas);

$reader = PHPExcel_IOFactory::createReader($this->file_type);   

$reader->setLoadSheetsOnly($sheet_name);    
$reader->setReadDataOnly(false);    
$reader->setReadFilter($filter);

$chunk = $reader->load($this->file);
$chunk->setActiveSheetIndexByName($sheet_name);

$active_sheet = $chunk->getActiveSheet();
$rowIterator = $active_sheet->getRowIterator($desde);
$this->num_filas = $active_sheet->getHighestRow();

Immediately after that, I included the following lines to have a better idea of what was happening:

ob_start();
var_dump($rowIterator);

$f = fopen("excel-info.txt", "w");
fwrite($f, ob_get_clean());
fclose($f);

ob_end_clean();
die;

And I think that it pointed to where the memory issue is. When I first uploaded the original XLS file, excel-info.txt had a size of 13M. Then I opened the XLS file and saved it as XLSX, and repeated the process, after which, excel-info.txt was only 285k.

Is there any way I can modify those filters to work with XLS files?

Oh, and setting the PHP memory limit to a higher value is not an option, but execution time is not critical.

ADDED

When I used different Memory Caching Options, I was able to reduce the memory usage enough to make it work, and kept it in most cases to an acceptable size.

Right now, I'm using PHPExcel_CachedObjectStorageFactory::cache_to_sqlite, and it seems to be enough to make it work.

I'd like to remark that the calculation I made over serialized information put in a file was incorrect. Excel5 files produce an array with as many records as rows has the excel file having all of its values that not meets the filter condition set to NULL. Of course, when I save it to a text file, having something like...

array(10) {
  ["A"]=>
  NULL
  ["B"]=>
  NULL
  ["C"]=>
  NULL
  ["D"]=>
  NULL
  ["E"]=>
  NULL
  ["F"]=>
  NULL
  ["G"]=>
  NULL
  ["H"]=>
  NULL
  ["I"]=>
  NULL
  ["J"]=>
  NULL
} 

...takes a lot of space in the file, but should not in php_memory, so that was my fault.

Now, I'm using this code to keep track of memory usage:

for ( $i=1; $i < 20000; $i+=5000 ){
        $filter->setRows($i, 5000);
        echo "\n1- Usage: ".(memory_get_usage ()/1024)/1024;
        $objPHPExcel = $reader->load($this->file);
        echo "\n2- Usage: ".(memory_get_usage ()/1024)/1024;
        $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
        unset($sheetData);
        unset($objPHPExcel);
        echo "\n3- Usage: ".(memory_get_usage ()/1024)/1024;
        }

With a certain XLS file, it shows:

1st Iteration 1- Usage: 4.3859634399414 2- Usage: 34.292671203613 3- Usage: 34.68034362793

2nd Iteration 1- Usage: 34.68034362793 2- Usage: 34.68293762207 3- Usage: 34.684982299805

And the same file, after saving as XLSX:

1st Iteration 1- Usage: 4.2780990600586 2- Usage: 6.9042129516602 3- Usage: 7.2916641235352

2nd Iteration 1- Usage: 7.2916641235352 2- Usage: 7.5115432739258 3- Usage: 7.2813568115234

I must say, however, that after saving as XLSX the size is reduced by approximately half, thus I cannot say whether it's a bug or it's expected behavior.

pnuts
  • 58,317
  • 11
  • 87
  • 139
sergio0983
  • 1,232
  • 8
  • 15
  • 1
    Read filters should work identically with the Excel5 Reader and with the Excel2007 Reader.... but if you're tight for memory, then you might want to consider using PHPExcel's cell caching options – Mark Baker Sep 20 '14 at 17:07
  • 1
    Maybe they should, but it´s clearly showing different behavior. I´m not sure, maybe I´m not using the filter as I am supposed to, or I did miss something. I´m not sure about memory limits, as this is in a shared server, and I have to be careful with that. About Cell Caching Options, could you point me to some documentation to give it a try? – sergio0983 Sep 20 '14 at 17:16
  • OK, provide me with a working example to demonstrate that this is indeed a bug in the code, and I'll look at it..... I've given up spending hours searching for bugs that are reported without a demonstratable example only to discover that there isn't any problem in the code – Mark Baker Sep 20 '14 at 20:16
  • 2
    Cell caching is explained in section 4.2.1 of the developer documentation, the section entitled `Cell Caching` - [online version from the github wiki](https://github.com/PHPOffice/PHPExcel/blob/develop/Documentation/markdown/Overview/04-Configuration-Settings.md) You're using serialized in memory, but there are much more memory-efficient options available, and serialized in memory may actually increase memory usage depending on your version of PHP – Mark Baker Sep 20 '14 at 20:19
  • Ty for the link, it helped a lot. – sergio0983 Sep 21 '14 at 09:25
  • About the bug, never said that it had to be a bug, it just didn´t work for me in some scenarios, but, given that I do not have a deep knowledge of how your libraries (thank you, by the way) work, most times I cannot tell whether the behavior is expected and I must try a workaround, modify my code, or simply assume that I must make some preprocessing to the files, and that´s why I posted the code showing filters, memory usage, ... hoping someone could tell me "34MB of memory usage for 3M excel5 file is normal" Then I know that the code I built is correct, and try some other solution. – sergio0983 Sep 21 '14 at 10:11
  • I don't use PHP, so can't help there. But on the Excel side of things, `xls` files can only store up to 65,536 rows. While `.xlsx` files can store up to just over 1mill rows. That could be part of your issue. Also as part of the change to the new `xlsx` files, the data is stored in xml format (in the background of the file) and can often result in smaller file size over that of an `xls` file. Hope that helps. – guitarthrower Oct 07 '14 at 19:32
  • Seems this is already answered. Please check this : http://stackoverflow.com/questions/3537604/how-to-fix-a-memory-error-in-php – pavanw3b Nov 06 '14 at 12:00
  • I am already using all those techniques, that somehow reduced the problem, but didn´t make a significant difference. Besides, it doesn´t explain why applying fiters to xls files generates such a bigger amount of memory usage compared to its xlsx counterpart, but anyway, I solved my issue using a Background Process to process the excel, increasing memory limit size, and, at the end, paying for VPS so I have enough memory without having my processes killed :P – sergio0983 Nov 06 '14 at 12:19

2 Answers2

3

PHPExcel is a memory hog. I have used it for several clients, and found that you have to experiment with the php memory limit setting to find the sweet spot where it can load the average file the client is likely to import. I have had to use as much as 8 GB on some projects. Of course, you do that in the routine that loads the xls file using ini_set('memory_limit','16M'), not in the php.ini file.

Have you tried setReadDataOnly(true)?

I think the reason is that xls files are not just csv data, but a catch-all for lots of other information (like fonts and macros). When you load the file, PHPExcel tries to load all the parts into memory, creating a huge structure.

TomTerrific
  • 111
  • 9
0

Just add this line (for example)

ini_set('memory_limit','254M');

this will solve the memory issue..

feel free to change the memory limit to make it suitable to your case

Ali Mezal
  • 1,437
  • 1
  • 8
  • 8