I have large Excel worksheets that I want to be able to read into MySQL using PHPExcel.
I am using the recent patch which allows you to read in Worksheets without opening the whole file. This way I can read one worksheet at a time.
However, one Excel file is 27MB large. I can successfully read in the first worksheet since it is small, but the second worksheet is so large that the cron job that started the process at 22:00 was not finished at 8:00 AM, the worksheet is simple too big.
Is there any way to read in a worksheet line by line, e.g. something like this:
$inputFileType = 'Excel2007';
$inputFileName = 'big_file.xlsx';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$worksheetNames = $objReader->listWorksheetNames($inputFileName);
foreach ($worksheetNames as $sheetName) {
//BELOW IS "WISH CODE":
foreach($row = 1; $row <=$max_rows; $row+= 100) {
$dataset = $objReader->getWorksheetWithRows($row, $row+100);
save_dataset_to_database($dataset);
}
}
Addendum
@mark, I used the code you posted to create the following example:
function readRowsFromWorksheet() {
$file_name = htmlentities($_POST['file_name']);
$file_type = htmlentities($_POST['file_type']);
echo 'Read rows from worksheet:<br />';
debug_log('----------start');
$objReader = PHPExcel_IOFactory::createReader($file_type);
$chunkSize = 20;
$chunkFilter = new ChunkReadFilter();
$objReader->setReadFilter($chunkFilter);
for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
$chunkFilter->setRows($startRow, $chunkSize);
$objPHPExcel = $objReader->load('data/' . $file_name);
debug_log('reading chunk starting at row '.$startRow);
$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
var_dump($sheetData);
echo '<hr />';
}
debug_log('end');
}
As the following log file shows, it runs fine on a small 8K Excel file, but when I run it on a 3 MB Excel file, it never gets past the first chunk, is there any way I can optimize this code for performance, otherwise it doesn't look like it is not performant enough to get chunks out of a large Excel file:
2011-01-12 11:07:15: ----------start
2011-01-12 11:07:15: reading chunk starting at row 2
2011-01-12 11:07:15: reading chunk starting at row 22
2011-01-12 11:07:15: reading chunk starting at row 42
2011-01-12 11:07:15: reading chunk starting at row 62
2011-01-12 11:07:15: reading chunk starting at row 82
2011-01-12 11:07:15: reading chunk starting at row 102
2011-01-12 11:07:15: reading chunk starting at row 122
2011-01-12 11:07:15: reading chunk starting at row 142
2011-01-12 11:07:15: reading chunk starting at row 162
2011-01-12 11:07:15: reading chunk starting at row 182
2011-01-12 11:07:15: reading chunk starting at row 202
2011-01-12 11:07:15: reading chunk starting at row 222
2011-01-12 11:07:15: end
2011-01-12 11:07:52: ----------start
2011-01-12 11:08:01: reading chunk starting at row 2
(...at 11:18, CPU usage at 93% still running...)
Addendum 2
When I comment out:
//$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
//var_dump($sheetData);
Then it parses at an acceptable speed (about 2 rows per second), is there anyway to increase the performance of toArray()
?
2011-01-12 11:40:51: ----------start
2011-01-12 11:40:59: reading chunk starting at row 2
2011-01-12 11:41:07: reading chunk starting at row 22
2011-01-12 11:41:14: reading chunk starting at row 42
2011-01-12 11:41:22: reading chunk starting at row 62
2011-01-12 11:41:29: reading chunk starting at row 82
2011-01-12 11:41:37: reading chunk starting at row 102
2011-01-12 11:41:45: reading chunk starting at row 122
2011-01-12 11:41:52: reading chunk starting at row 142
2011-01-12 11:42:00: reading chunk starting at row 162
2011-01-12 11:42:07: reading chunk starting at row 182
2011-01-12 11:42:15: reading chunk starting at row 202
2011-01-12 11:42:22: reading chunk starting at row 222
2011-01-12 11:42:22: end
Addendum 3
This seems to work adequately, for instance, at least on the 3 MB file:
for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ', $startRow, ' to ', ($startRow + $chunkSize - 1), '<br />';
$chunkFilter->setRows($startRow, $chunkSize);
$objPHPExcel = $objReader->load('data/' . $file_name);
debug_log('reading chunk starting at row ' . $startRow);
foreach ($objPHPExcel->getActiveSheet()->getRowIterator() as $row) {
$cellIterator = $row->getCellIterator();
$cellIterator->setIterateOnlyExistingCells(false);
echo '<tr>';
foreach ($cellIterator as $cell) {
if (!is_null($cell)) {
//$value = $cell->getCalculatedValue();
$rawValue = $cell->getValue();
debug_log($rawValue);
}
}
}
}