0

Summary

This is a script (CakePHP 2.10.18 - LAMP dedicated server with PHP 5.3) that loads information from 2 MySQL tables, and then does some process of the data to output it to excel.

Table 1 has users, and Table 2 has info about those users (one record per user). The script has the goal of grabbing the record of a user from Table 1, grabbing its related info from Table 2, and put it in an excel row (using PHPExcel_IOFactory library for this).

The information extracted of those tables is of around 8000 records from each, the tables themselves have 100K and 300K total records respectively. All the fields in those tables are ints and small varchars with the exception of one field in the second table (datos_progreso seen in the code below), which is a text field and contains serialized data, but nothing big.

The issue is that if I run the script for the full 16000 records I get an Internal Server Error (without really any explanation in the logs), if I run the script for 1000 records it all works fine, so this seems to point out it's a resources issue.

I've tried (among other things that I will explain at the end) increasing the memory_limit from 128M to 8GB (yes you read that right), max_execution_time from 90 to 300 seconds, and max_input_vars from 1000 to 10000, and that isn't solving the problem.

My thoughts are that the amount of data isn't that huge to cause the resources to run out, but I've tried optimizing the script in several ways and can't get it to work. The only time I get it to work is by running it on a small portion of the records like I mention above.

I would like to know if there's something script-wise or php-configuration-wise I can do to fix this. I can't change the database tables with the information by the way.

Code

This is just the relevant bits of code that I think matter, the script is longer:

       $this->Usuario->bindModel(
            array('hasMany' => array(
                'UsuarioProgreso' => array('className' => 'UsuarioProgreso', 'foreignKey' => 'id_usuario', 'conditions' => array('UsuarioProgreso.id_campania' => $id_campania)))
                ));
    
       $usuarios = $this->Usuario->find('all', array(
            'conditions'=>array('Usuario.id_campania'=>$id_campania, 'Usuario.fecha_registro >'=>'2020-05-28'),
            'fields'=>array('Usuario.id_usuario', 'Usuario.login', 'Usuario.nombre', 'Usuario.apellido', 'Usuario.provincia', 'Usuario.telefono', 'Usuario.codigo_promocion'),
            'order'=>array('Usuario.login ASC')
        ));

        $usuario = null;
        $progreso_usuario = null;
        $datos_progreso = null;
        $i = 2;

        
        foreach ($usuarios as $usuario) {        
    
                if (isset($usuario['UsuarioProgreso']['datos_progreso'])) {
    
                    $datos_progreso = unserialize($progreso['UsuarioProgreso']['datos_progreso']);
    
                    $unit = 1;
                    $column = 'G';
    
                    while ($unit <= 60) {
    
                        if (isset($datos_progreso[$unit]['punt']))
                            $puntuacion = $datos_progreso[$unit]['punt'];
                        else
                            $puntuacion = ' ';
                        
                        $objSheet->getCell($column.$i)->setValue($puntuacion);
    
                        $column++;
                        $unit++;
                    }
    
                    $nivel = 1;
                    $unidad_nivel = array(1 => 64, 2 => 68, 3 => 72, 4 => 76, 5 => 80, 6 => 84);
    
                    while ($nivel <= 6) {
    
                        $unidad = $unidad_nivel[$nivel]; 
    
                        if (isset($datos_progreso[$unidad]['punt']))
                            $puntuacion = $datos_progreso[$unidad]['punt'];
                        else
                            $puntuacion = ' ';
                        
                        $objSheet->getCell($column.$i)->setValue($puntuacion);
    
                        $column++;
                        $nivel++;
                    }
                }
    
                //Free the variables
                $usuario = null;
                $progreso_usuario = null;
                $datos_progreso = null;

            $i++;
        }     

What I have tried

I have tried not using bindModel, and instead just load the information of both tables separately. So loading all the info of users first, looping through it, and on each loop grab the info for that specific user from Table 2.

I have tried also something similar to the above, but instead of loading all the info at once for the users from Table 1, just load first all their IDs, and then loop through those IDs to grab the info from Table 1 and Table 2. I figured this way I would use less memory.

I have also tried not using CakePHP's find(), and instead use fetchAll() with "manual" queries, since after some research it seemed like it would be more efficient memory-wise (didn't seem to make a difference)


If there's any other info I can provide that can help understand better what's going on please let me know :)


EDIT:

Following the suggestions in the comments I've implemented this in a shell script and it works fine (takes a while but it completes without issue).

With that said, I would still like to make this work from a web interface. In order to figure out what's going on, and since the error_logs aren't really showing anything relevant, I've decided to do some performance testing myself.

After that testing, these are my findings:

  • It's not a memory issue since the script is using at most around 300 MB and I've given it a memory_limit of 8GB
  • The memory usage is very similar whether it's via web call or shell script
  • It's not a timeout issue since I've given the script 20 minutes limit and it crashes way before that

What other setting could be limiting this/running out that doesn't fail when it's a shell script?

Albert
  • 1,516
  • 3
  • 24
  • 55
  • If you can turn on error logging and increase log levels in PHP that might at least point you at if the problem is your script, Cake queries, or PHPExcel. I would note that memory issues with PHPExcel seem to be a "thing" https://stackoverflow.com/questions/3537604/how-to-fix-memory-getting-exhausted-with-phpexcel with https://stackoverflow.com/search?q=PHPExcel+memory showing 400+ results – Andy Hoffner Jul 29 '20 at 22:44
  • @ahoffner would these show as errors along the internal server error on the website, or in the apache error logs? I ask because I actually turned on the `display_errors` PHP option and it didn't make a difference... – Albert Jul 29 '20 at 22:59
  • `display_errors` toggles displaying them on the website yes. Enable all error levels with `error_reporting = E_ALL` and to log them that `log_errors` is set. Log locations are usually set in Apache configs, make sure that you're looking at the right log file. And finally check Cake isn't overriding either these, or set them redundantly to be sure - `error_reporting(E_ALL); ini_set("display_errors", 1);` – Andy Hoffner Jul 29 '20 at 23:08
  • Run this on a shell and not in a web context. If you have to send the file to a user, then schedule a task that is executed in the background and the user informed when the file is ready for download. There are plenty of ways of doing this, even with showing a progress bar. Doing what you want in the context of a request is simply the wrong approach to the problem. – floriank Jul 30 '20 at 00:33
  • Generally great advice - offload any big scripts to backend processes. But if 8 GB of RAM isn't enough - it wont matter if it's in the background, there's some kind of problem likely in that super old PHPExcel library I'd bet. There's a newer version but not sure if upgrading is an option or if it'd help in this case. What are you unserialize'ing and sticking in cells? Maybe that's too complex for this library at scale? – Andy Hoffner Jul 30 '20 at 03:33
  • @burzum well I ran this on a shell and it works! Takes a long time to complete but it does finish. This makes me think that it's not a memory issue but a timeout issue, maybe 5 min wasn't enough? Anyway, I do need to figure out a way to making this work via a web interface, even if it's calling a shell script from a web interface, is that possible? – Albert Jul 30 '20 at 18:18
  • @ahoffner what I'm unserializing is actually quite simple, just an array of associative arrays (just 2 keys each), each key containing an integer. I'm wondering if running this under php 7 would be more efficient – Albert Jul 30 '20 at 18:21
  • I've added an edit to the post with some conclusions after some performance testing – Albert Jul 30 '20 at 20:31
  • Albert, well, just do what I proposed. You very likely already wasted more time on trying to get this working the wrong way than doing it right. Besides technical issues there are UX issues with your approach as well. – floriank Jul 30 '20 at 20:34
  • @burzum well I did do what you suggested and used a shell script. The reason I mention the web interface is that there are variables I will need to let users customize. I'm happy to follow your advice of course, I appreciate it, I was just asking if there is a way to connect the web interface with the shell script so I can make it run with those variables. Is that possible? – Albert Jul 30 '20 at 23:05
  • I still think first step is to pull a detailed PHP error message that can tell you the exact nature of the error and line of code that it's dying on. Once you have this the solution will be crystal clear & we can all stop guessing. I'd keep trying stuff on this front, lots of SO questions about this to check out, like https://serverfault.com/questions/153115/error-in-php-not-logging-or-displaying – Andy Hoffner Jul 31 '20 at 16:33
  • @albert google for queuing. You send a job to a queue, a shell is processing the queue in the background and triggers tasks that process the job from the queue. This is common and normal way of handling long running background processes. If you want it more modern you can do the same by using websockets (again, use google). – floriank Aug 01 '20 at 14:04

1 Answers1

0

The way I solved this was using a shell script by following the advice from the comments. I've understood that my originally intended approach was not the correct one, and while I have not been able to figure out what exactly was causing the error, it's clear that using a web script was the root of the problem.

Albert
  • 1,516
  • 3
  • 24
  • 55