2

I'm having a trouble when tried to use array_combine in a foreach loop. It will end up with an error:

PHP Fatal error:  Allowed memory size of 268435456 bytes exhausted (tried to allocate 85 bytes) in

Here is my code:

$data = array();
$csvData = $this->getData($file);
if ($columnNames) {
    $columns = array_shift($csvData);
    foreach ($csvData as $keyIndex => $rowData) {
        $data[$keyIndex] = array_combine($columns, array_values($rowData));
    }
}

return $data;

The source file CSV which I've used has approx ~1,000,000 rows. This row

$csvData = $this->getData($file)

I was using a while loop to read CSV and assign it into an array, it's working without any problem. The trouble come from array_combine and foreach loop.

Do you have any idea to resolve this or simply have a better solution?

UPDATED

Here is the code to read the CSV file (using while loop)

$data = array();
if (!file_exists($file)) {
    throw new Exception('File "' . $file . '" do not exists');
}

$fh = fopen($file, 'r');
while ($rowData = fgetcsv($fh, $this->_lineLength, $this->_delimiter, $this->_enclosure)) {
    $data[] = $rowData;
}
fclose($fh);
return $data;

UPDATED 2

The code above is working without any problem if you are playing around with a CSV file <=20,000~30,000 rows. From 50,000 rows and up, the memory will be exhausted.

Toan Nguyen
  • 911
  • 13
  • 35

1 Answers1

4

You're in fact keeping (or trying to keep) two distinct copies of the whole dataset in your memory. First you load the whole CSV date into memory using getData() and the you copy the data into the $data array by looping over the data in memory and creating a new array.

You should use stream based reading when loading the CSV data to keep just one data set in memory. If you're on PHP 5.5+ (which you definitely should by the way) this is a simple as changing your getData method to look like that:

protected function getData($file) {
    if (!file_exists($file)) {
        throw new Exception('File "' . $file . '" do not exists');
    }

    $fh = fopen($file, 'r');
    while ($rowData = fgetcsv($fh, $this->_lineLength, $this->_delimiter, $this->_enclosure)) {
        yield $rowData;
    }
    fclose($fh);
}

This makes use of a so-called generator which is a PHP >= 5.5 feature. The rest of your code should continue to work as the inner workings of getData should be transparent to the calling code (only half of the truth).

UPDATE to explain how extracting the column headers will work now.

$data = array();
$csvData = $this->getData($file);
if ($columnNames) { // don't know what this one does exactly
    $columns = null;
    foreach ($csvData as $keyIndex => $rowData) {
        if ($keyIndex === 0) {
            $columns = $rowData;
        } else {
            $data[$keyIndex/* -1 if you need 0-index */] = array_combine(
                $columns, 
                array_values($rowData)
            );
        }
    }
}

return $data;
Stefan Gehrig
  • 82,642
  • 24
  • 155
  • 189
  • Thanks for your response, but what exactly `yield` doing in this game? – Toan Nguyen May 20 '16 at 08:54
  • 1
    `yield` is a little bit more complicated than what can be described here in a comment. You should definitely read http://php.net/manual/en/language.generators.overview.php, http://blog.ircmaxell.com/2012/07/what-generators-can-do-for-you.html and http://stackoverflow.com/questions/17483806/what-does-yield-mean-in-php – Stefan Gehrig May 20 '16 at 08:58
  • There is a small issue, I'm using this `$columns = array_shift($csvData);` to move CSV column names into an array, and array_combine will use that new array to the source array (from csv). How to force yield return an array instead of an object? – Toan Nguyen May 20 '16 at 09:15
  • You need to do that a little bit different. `yield` will simply return row after row. If you need to handle row `0` (the first row containing the headers) differently, you need to check the `$keyIndex`. `if $keyIndex === 0` then you extract the columns, `if $keyIndex > 0` you continue your normal handling of data rows. – Stefan Gehrig May 20 '16 at 09:18
  • Could you update your answer with this, please? It still confusing me, first time I heard about `yield` and `Generators` :) – Toan Nguyen May 20 '16 at 09:22
  • @ToanNguyen: Done. – Stefan Gehrig May 20 '16 at 09:27