2

I have this PHP code (version 5.4) that reads in a 6MB text file and splits it into lines, then splits those lines by tabs:

echo "Initial memory used: ".memory_get_usage()." bytes.\n";
$data = file_get_contents(ENROLLMENTS_SRC_PATH);
echo "About to split into lines, current memory used: ".memory_get_usage()." bytes.\n";
$data = explode("\n", $data);
echo "About to split lines into columns, current memory used: ".memory_get_usage()." bytes.\n"; 
$line = explode("\t", $data[0]);
echo "Split one line, current memory used: ".memory_get_usage()." bytes.\n";
echo "Peak memory used so far: ".memory_get_peak_usage()." bytes.\n";
foreach($data as $key => $line) {
    $data[$key] = explode("\t", $line);
}
echo "Final memory used: ".memory_get_usage()." bytes.\n";
echo "Peak memory used: ".memory_get_peak_usage()." bytes.\n";

I understand that PHP arrays have very high overhead, but what I was not expecting was for the peak usage during the foreach loop to be about 28MB larger than the final result, according to these results:

Initial memory used: 226384 bytes. 
About to split into lines, current memory used: 6536952 bytes.
About to split lines into columns, current memory used: 18327712 bytes. 
Split one line, current memory used: 18328352 bytes. 
Peak memory used so far: 24639744 bytes.
Final memory used: 116898184 bytes.
Peak memory used: 135000584 bytes.

This is so large that the peak memory was higher than the memory used by $data before the loop and after the loop combined, even though the size of a single split line appears to be under a kilobyte. I am trying to reduce the memory used by my script, so I am trying to understand how PHP uses memory.

Why does the foreach loop use so much apparently excess memory, and is there anything I can do about it?

  • 1
    Instead of reading the whole file at once you could get line by line. https://stackoverflow.com/questions/13246597/how-to-read-a-file-line-by-line-in-php – ka_lin Jul 13 '17 at 16:56
  • @ka_lin That works nicely. However, since at later points during my script I also have to lookup a variety of pieces of data, transform them, and compare them, I would would not be surprised if a similar phenomenon occurs during my script at a later time, so I would like to understand why this happens, as well. – Nathaniel Verhaaren Jul 13 '17 at 17:08
  • After some additional research I found a personally satisfactory explanation myself. – Nathaniel Verhaaren Jul 14 '17 at 19:07

1 Answers1

1

The explanation behind most of the excess memory is found in this explanation of the workings of foreach, which explains that foreach works on a "copy" of the array. The copy is actually a copy-on-write, where duplication only actually takes place if/when the array is modified. Since this foreach loop does modify the original array, it creates a copy of it to loop over, which explains most of the extra 28MB.

There are several ways to prevent this, including using any other loop except a foreach, or using references as follows:

foreach($data as &$line) {
    $line = explode("\t", $line);
}
unset($line)

When either the array being looped over or the variable that holds array entries is a reference, no duplication is made. However, it is important to then unset $line to prevent accidentally modifying the contents of the array after the loop runs.