17

My PHP app has an import script that can import records.

At the moment, it is importing from a CSV file. It is reading each line of the CSV file, one line at a time using fgetcsv, and for each line it is doing a lot of processing on that record, including database queries, and then moving on to the next line. It shouldn't need to keep accumulating more memory.

After around 2500 records imported, PHP dies, saying that it has run over its memory limit (132 MB or so).

The CSV file itself is only a couple of megs - the other processing that happens does a lot of string comparisons, diffs, etc. I have a huge amount of code operating on it and it would be difficult to come up with a 'smallest reproducing sample'.

What are some good ways to go about finding and fixing such a problem?

Cause of problem found

I have a debug class which logs all my database queries during runtime. So those strings of SQL, some 30KB long, were staying in memory. I realise this isn't suitable for scripts designed to run for a long time.

There may be other sources of memory leaks, but I am fairly sure this is the cause of my problem.

thomasrutter
  • 114,488
  • 30
  • 148
  • 167
  • 1
    +1 I work with CSV files a lot in PHP and am eventually going to come across this problem. – alex Jun 18 '09 at 02:28
  • 1
    I had a similar problem in CodeIgniter processing a big log file and inserting into a DB. I changed the inserts to use CodeIgniters `simple_query` instead of the usual `query` method and reduced memory use by a factor of 10. – Matthew Smith May 03 '11 at 06:18

8 Answers8

7

If you do in fact suspect that there are just one or two memory leaks in your script which are causing it to crash, then you should take the following steps:

  • Change memory_limit to something small, like 500KB
  • Comment out all but one of the processing steps which is applied to each row.
  • Run the limited processing over the whole CSV file and see if it can complete.
  • Gradually add more steps back in and watch to see if memory usage spikes.

Example:

ini_set('memory_limit', 1024 * 500);
$fp = fopen("test.csv", 'r');
while($row = fgetcsv($fp)) {
    validate_row($row);         // step 1: validate
    // add these back in one by one and keep an eye on memory usage
    //calculate_fizz($row);     // step 2: fizz
    //calculate_buzz($row);     // step 3: buzz
    //triangulate($row);        // step 4: triangulate
}
echo "Memory used: ", memory_get_peak_usage(), "\n";

The worst case scenario is that all of your processing steps are moderately inefficient and you will need to optimize all of them.

too much php
  • 88,666
  • 34
  • 128
  • 138
  • 1
    Thanks for the suggestion. It is a good suggestion for anyone with a similar problem, but I just happened to find xdebug's "trace" feature to be most useful in finding the cause here. – thomasrutter Jun 18 '09 at 07:13
5

It would help to have a look at the code but if you want to debug it yourself, have a look at Xdebug, it'll help profile your application.

Of course, depending on what you are doing, it is possible it's accumulating some memory, although 132MB seems already high for 2500 records. Of course, you can tweak your memory limit in php.ini if needed.

How big is the CSV file you are reading? And what objects and kind of processing are you doing to it?

lpfavreau
  • 12,871
  • 5
  • 32
  • 36
2

It depends on how are you clearing the variables after being done with them.

It looks like you are done with the record but you are still storing the information somewhere. Use unset() to clear variables up if in doubt.

Please provide a minimal reproducing code sample to see where is all that memory going if this doesn't help.

BTW, producing the smallest code sample that will reproduce the problem is a great debugging technique because it forces you to go through the code again, with care.

Vinko Vrsalovic
  • 330,807
  • 53
  • 334
  • 373
2

you could try a local installation of php5.3 and call http://www.php.net/manual/en/function.gc-collect-cycles.php.

gc_collect_cycles — Forces collection of any existing garbage cycles

if the situation improves, you at least verified (on of) the problem(s).

stefs
  • 18,341
  • 6
  • 40
  • 47
1

How are you reading the file? If your using fread/filegetcontents or other such functions then you are going to consume the entire file size (or however much you load with fread) in memory as the entire file is loaded at call time. However if you use fgetcsv if will only read one line at a time depending on the length of the line this can be dramaticly easier on your memory.

Also make sure that you are reusing as many variables as possible on each loop. Check that there are no array with large amounts of data in them.

As a last note also make sure that you are opening your file before your loop then closing it afterwords:

$fh = fopen(...);
while(true)
{
//...
}
fclose($fh);

You don't realy want to be doing this:

while(true)
{
$fh = fopen(...);
//...
fclose($fh);
}

And like others have said it'll be hard to tell without seeing some code.

UnkwnTech
  • 88,102
  • 65
  • 184
  • 229
  • I was using fgetcsv, sorry I forgot to mention. I was fairly sure the problem wasn't in reading the file, because the file itself is relatively small. – thomasrutter Jun 18 '09 at 07:17
0

It's difficult to say the cause without seeing any code. However, a typical issue is recursive references, ie. object A points to object B and the other way around, which may cause the GC to screw up.

I don't know how you're currently processing the file, but you could attempt to only read the file one row at a time. If you read the whole file at once it may consume more memory.

This is actually one of the reasons I often prefer Python for batch processing tasks.

Jani Hartikainen
  • 42,745
  • 10
  • 68
  • 86
0

Are you able to change your memory_limit in your php.ini?

Also, could doing unset($var) on variables free up some memory? Could $var = null help too?

See also this question: What's better at freeing memory with PHP: unset() or $var = null

Community
  • 1
  • 1
alex
  • 479,566
  • 201
  • 878
  • 984
  • Good suggestion, but the memory limit is already at 128MB or so, and increasing it would only buy the ability for it to run a little longer - eventually I'd want to be able to do imports of more than 10, maybe 50 times this size. – thomasrutter Jun 18 '09 at 07:17
  • Yes, I realise it's not the ideal fix. Just thought I'd mention it. – alex Jun 18 '09 at 08:41
0

I was having the same problem, and it was also due to database profiling (Zend_Db_Profiler_Firebug). In my case it was leaking 1mb per minute. this script was supposed to run for days, so it would crash within a few hours.

Simon Lang
  • 40,171
  • 9
  • 49
  • 58