61

In a PHP program, I sequentially read a bunch of files (with file_get_contents), gzdecode them, json_decode the result, analyze the contents, throw most of it away, and store about 1% in an array.

Unfortunately, with each iteration (I traverse over an array containing the filenames), there seems to be some memory lost (according to memory_get_peak_usage, about 2-10 MB each time). I have double- and triple-checked my code; I am not storing unneeded data in the loop (and the needed data hardly exceeds about 10MB overall), but I am frequently rewriting (actually, strings in an array). Apparently, PHP does not free the memory correctly, thus using more and more RAM until it hits the limit.

Is there any way to do a forced garbage collection? Or, at least, to find out where the memory is used?

Sᴀᴍ Onᴇᴌᴀ
  • 8,218
  • 8
  • 36
  • 58
DBa
  • 1,263
  • 3
  • 11
  • 16
  • 1
    If I pass increasingly larger data chunks to json\_dcode() more memory is used (and not freed again, at least not in my test environment, but currently it doesn't hit the memory limit). If the _same_ data is parsed again (data of the same structure, doesn't have to be exactly the same variable) in the same php instance there is no further increase. Do the structure, "size" and values of the json data you feed to json\_decode() vary a lot? – VolkerK Mar 17 '10 at 13:56
  • No, the data structure is almost exactly the same - it is an array of objects with a constant structure, only the length of the array varies. – DBa Mar 17 '10 at 14:36
  • 1
    `memory_get_peak_usage` reports a monotonically increasing value--the maximum memory you were using at any point of the program. Use `memory_get_usage(true)` to get the current actual memory being used. – David Harkness Jun 25 '12 at 17:33
  • http://stackoverflow.com/questions/3110235/php-garbage-collection-while-script-running – Prof. Falken Nov 02 '12 at 15:32
  • there is a nice article that explains memory usage here http://arr.gr/blog/2014/05/php-memory-usage-unnecessary-string-concatenation/ – Thomas Tran Jul 24 '14 at 10:07

8 Answers8

39

it has to do with memory fragmentation.

Consider two strings, concatenated to one string. Each original must remain until the output is created. The output is longer than either input.
Therefore, a new allocation must be made to store the result of such a concatenation. The original strings are freed but they are small blocks of memory.
In a case of 'str1' . 'str2' . 'str3' . 'str4' you have several temps being created at each . -- and none of them fit in the space thats been freed up. The strings are likely not laid out in contiguous memory (that is, each string is, but the various strings are not laid end to end) due to other uses of the memory. So freeing the string creates a problem because the space can't be reused effectively. So you grow with each tmp you create. And you don't re-use anything, ever.

Using the array based implode, you create only 1 output -- exactly the length you require. Performing only 1 additional allocation. So its much more memory efficient and it doesn't suffer from the concatenation fragmentation. Same is true of python. If you need to concatenate strings, more than 1 concatenation should always be array based:

''.join(['str1','str2','str3'])

in python

implode('', array('str1', 'str2', 'str3'))

in PHP

sprintf equivalents are also fine.

The memory reported by memory_get_peak_usage is basically always the "last" bit of memory in the virtual map it had to use. So since its always growing, it reports rapid growth. As each allocation falls "at the end" of the currently used memory block.

nategood
  • 11,807
  • 4
  • 36
  • 44
James Lyons
  • 391
  • 3
  • 2
  • The above given answer does not seem to be completely true. I used PHPParser to rewrite all concat-Operations to implodes in my project (which consists of 1000 classes and the underlying cake-framework). After that I tested it with a huge script which is taking all my Models, does some aggregation and saves it back to the db. --- - With Concat (.) it takes 103s and uses: + at the start: 24,751MB + at the end: 45,985MB - With Implode it takes about 109s and uses: + at the start: 25,341MB + at the end: 46,660MB Im loosing about 6kb per model, thats why the memory usage increases. – velop Jan 11 '14 at 11:02
  • @velop Clearly there is a bug in one of the *many* libraries you are using, that has nothing to do with how PHP works. You would need to fix the problem in the underlying Class/Method/etc. – Robert Schwindaman Mar 22 '17 at 08:56
28

In PHP >= 5.3.0, you can call gc_collect_cycles() to force a GC pass.

Note: You need to have zend.enable_gc enabled in your php.ini enabled, or call gc_enable() to activate the circular reference collector.

kenorb
  • 155,785
  • 88
  • 678
  • 743
Mo.
  • 934
  • 6
  • 8
  • 4
    I think you need to call gc_enable() first. – PCheese Jan 06 '11 at 00:52
  • 7
    The Garbage Collector in PHP 5.3+ is *not* the primary memory management mechanism. It is used exclusively to deal with the problem of circular references, which the main refcount-based system cannot handle. The scenario described involves no such circular references, so would be completely unaffected by the GC. – IMSoP Mar 23 '13 at 13:26
  • It still helps in PHP 5.3 in combination with memory_get_usage to see which memory is freed again. – velop Jan 11 '14 at 11:17
16

Found the solution: it was a string concatenation. I was generating the input line by line by concatenating some variables (the output is a CSV file). However, PHP seems not to free the memory used for the old copy of the string, thus effectively clobbering RAM with unused data. Switching to an array-based approach (and imploding it with commas just before fputs-ing it to the outfile) circumvented this behavior.

For some reason - not obvious to me - PHP reported the increased memory usage during json_decode calls, which mislead me to the assumption that the json_decode function was the problem.

DBa
  • 1,263
  • 3
  • 11
  • 16
  • 2
    Do you mind giving some more detail about this? It might help me out. You were resetting an existing string variable in each iteration of your loop, but the memory used to hold the old string(s) was not being released - was that the problem? Now, using an array to hold the data it does free the memory? – Scott Saunders Mar 17 '10 at 14:52
  • 2
    Scott, sorry for not getting back to you: I was overwriting an already-used string ($s = $s . "new contents";). Though it is well-known that such a concatenation invokes a new allocation, I did not know that the old copy remains in place and blocks memory. So I switch to the approach like $a = array(); array_push($a, "new contents";) and imploded the array afterwards. – DBa Aug 21 '11 at 12:06
  • FYI, this also seems to happen with objects. In the object case, using unset on the variable before setting a new value stops the leak. – Rich Remer Jul 16 '14 at 18:57
14

There's a way.

I had this problem one day. I was writing from a db query into csv files - always allocated one $row, then reassigned it in the next step. Kept running out of memory. Unsetting $row didn't help; putting an 5MB string into $row first (to avoid fragmentation) didn't help; creating an array of $row-s (loading many rows into it + unsetting the whole thing in every 5000th step) didn't help. But it was not the end, to quote a classic.

When I made a separate function that opened the file, transferred 100.000 lines (just enough not to eat up the whole memory) and closed the file, THEN I made subsequent calls to this function (appending to the existing file), I found that for every function exit, PHP removed the garbage. It was a local-variable-space thing.

TL;DR

When a function exits, it frees all local variables.

If you do the job in smaller portions, like 0 to 1000 in the first function call, then 1001 to 2000 and so on, then every time the function returns, your memory will be regained. Garbage collection is very likely to happen on return from a function. (If it's a relatively slow function eating a lot of memory, we can safely assume it always happens.)

Side note: for reference-passed variables it will obviously not work; a function can only free its inside variables that would be lost anyway on return.

I hope this saves your day as it saved mine!

dkellner
  • 8,726
  • 2
  • 49
  • 47
  • 1
    Yeah dunno where did I read this before, but this solution always works best for me – Gaurav Pandey Oct 01 '13 at 07:50
  • 1
    Thank you for this tip! It was a tremendous help to me today. – kalinma Nov 21 '17 at 21:57
  • @kalinma Always a pleasure :) What's worrying me is that apparently nothing has changed since... Is it PHP7 you're using? Because that would really be a shame to encounter this problem in such a strongly refactored engine. – dkellner Nov 22 '17 at 12:05
  • @dkellner, I'm actually using PHP 5.6. Would be nice to upgrade to 7 as Joomla 3.7 supports it, but our servers are currently set up for 5.6. – kalinma Nov 22 '17 at 18:52
  • Anyone else watching Gotham series and remembering this article each time they say GCPD? – dkellner Sep 22 '20 at 09:09
  • It doesn't work for me: at the return from a function, the memory usage constantly increases. no referenced variables/objects and no global variables were used. See https://stackoverflow.com/questions/69213003/continuosly-increasing-memory-usage-looping-a-php-function – fede72bari Sep 17 '21 at 12:55
  • @fede72bari It might be a difference in php engines (the article is from 2013) but also, memory_get_usage(1) would be better for your measurements (you're giving it a false now), and some explicite unset commands could also trigger the process. – dkellner Sep 18 '21 at 11:34
12

I've found that PHP's internal memory manager is most-likely to be invoked upon completion of a function. Knowing that, I've refactored code in a loop like so:

while (condition) {
  // do
  // cool
  // stuff
}

to

while (condition) {
  do_cool_stuff();
}

function do_cool_stuff() {
  // do
  // cool
  // stuff
}

EDIT

I ran this quick benchmark and did not see an increase in memory usage. This leads me to believe the leak is not in json_decode()

for($x=0;$x<10000000;$x++)
{
  do_something_cool();
}

function do_something_cool() {
  $json = '{"a":1,"b":2,"c":3,"d":4,"e":5}';
  $result = json_decode($json);
  echo memory_get_peak_usage() . PHP_EOL;
}
Mike B
  • 31,886
  • 13
  • 87
  • 111
  • This reduced the leak, but has not fixed it entirely... Apparently, something is leaking inside the `json_decode` function - is there any alternative implementation? I do not care if it is a bit slower, as long as it does not eat up memory (currently, the program hits 1 GB mark at 60% of processing, causing the machine to swap and thus growing _VERY_ slow... There is nothing which would justify such a memory use, the chunks read are all about 10 MB and they are processed subsequently). – DBa Mar 17 '10 at 12:29
  • Mike, I tried the same and haven't been able to reproduce the leak with a "simple" approach (fuzzing around with a simple array) either. Will try to run it with my input data, maybe that's the problem. – DBa Mar 17 '10 at 13:19
  • 1
    While trying to reproduce the whole thing, I eventually found the solution: it was a string concatenation. I was generating the input line by line by concatenating some variables (the output is a CSV file). However, PHP seems not to free the memory used for the old copy of the string, thus effectively clobbering RAM with unused data. Switching to an array-based approach (and imploding it with commas just before fputs-ing it to the outfile) circumvented this behavior. – DBa Mar 17 '10 at 14:39
  • 1
    @DBa: Could you create an answer for this and mark is as correct? It took me reading all the comments to find your final solution :-P And it was very helpful – Hubro Apr 30 '12 at 12:12
  • 2
    Wrapping things in a function has nothing to do with "invoking GC". What is happening is that at the end of the function, all variables used in that function simultaneously go out of scope, as though you had `unset()` them all at once. No Garbage Collection is done as such, the variables simply reach a refcount of 0 and are immediately freed. – IMSoP Mar 23 '13 at 13:30
  • @IMSoP My answer seems pretty similar to http://stackoverflow.com/a/5302338/46675. I just skipped the [local variables are disposed to be cleaned up by] **gc** part. I don't know how you can call this method unhelpful. – Mike B Mar 23 '13 at 16:04
  • @MikeB The point is that the "Garbage Collector" is *not* the primary memory management mechanism in PHP. A variable's memory is returned to the Zend Memory Manager *as soon as its reference count reaches zero*; the GC is used purely to catch circular references, since they never reach a count of zero. Falling out of scope and being explicitly `unset()` are equivalent as far as this is concerned. – IMSoP Mar 23 '13 at 16:11
  • @MikeB The reason you might observe a difference is that the Zend Memory Manager doesn't release this memory back to the operating system straight away, but keeps it around *for use by other PHP variables*. So the variable you `unset()` has been freed, but the next variable you create can be slotted straight back into that space. I don't know whether the ZMM adjusts memory allocations on function exit, but that is not related to "garbage collection". – IMSoP Mar 23 '13 at 16:15
6

Call memory_get_peak_usage() after each statement, and ensure you unset() everything you can. If you are iterating with foreach(), use a referenced variable to avoid making a copy of the original (foreach()).

foreach( $x as &$y)

If PHP is actually leaking memory a forced garbage collection won't make any difference.

There's a good article on PHP memory leaks and their detection at IBM

Andy
  • 17,423
  • 9
  • 52
  • 69
  • 3
    Using unset() is a good solution, but you still rely on the GC. You may also try to assign the variables you don't need anymore to NULL. The memory may be reclaimed faster. – Macmade Mar 17 '10 at 11:29
  • The IBM article basically says "use `memory_get_peak_usage` to locate the leaks, which is not very helpful, as I already seem to have located it - however, I have no idea how to get rid of a memory leak in an internal PHP function... – DBa Mar 17 '10 at 12:30
  • 1
    If it's internal to a PHP function you can't get rid of it, it's a bug in the language! If you have detected the leak, perhaps you have identified a function you should a) try to find a equivalent of b) report @ http://bugs.php.net/ Perhaps you should post the code you're having trouble with? – Andy Mar 17 '10 at 13:12
  • That IBM article is about PHP 5.2 when PHP did not have a real garbage collector (that is, one able to collect unreferenced cycles). If you're running PHP 5.3 or newer, first try `gc_collect_cycles()` after possibly leaking memory. – Mikko Rantalainen Feb 01 '13 at 11:07
  • 1) The Garbage Collector *supplements* refcount-based deallocation, not replaces it, so `unset()` in most cases will immediately free memmory. 2) Assigning by reference is usually *worse* for memory performance, because it conflicts with PHP's automatic Copy-On-Write optimisations (a normal assignment does *not* immediately copy the contents of the variable). – IMSoP Mar 23 '13 at 13:32
  • @Andy: Update: IBM has removed the article you have referenced. – Bhavik Shah Feb 20 '18 at 12:46
6

I was going to say that I wouldn't necessarily expect gc_collect_cycles() to solve the problem - since presumably the files are no longer mapped to zvars. But did you check that gc_enable was called before loading any files?

I've noticed that PHP seems to gobble up memory when doing includes - much more than is required for the source and the tokenized file - this may be a similar problem. I'm not saying that this is a bug though.

I believe one workaround would be not to use file_get_contents but rather fopen()....fgets()...fclose() rather than mapping the whole file into memory in one go. But you'd need to try it to confirm.

HTH

C.

symcbean
  • 47,736
  • 6
  • 59
  • 94
4

There recently was a similar issue with System_Daemon. Today I isolated my problem to file_get_contents.

Could you try using fread instead? I think this may solve your problem. If it does, it's probably time to do a bugreport over at PHP.

Community
  • 1
  • 1
kvz
  • 5,517
  • 1
  • 42
  • 33