3

I'm facing a fatal error while I'm trying to manipulate a huge array of arrays in PHP and return the result as a response of an HTTP POST request:

Allowed memory size of 536870912 bytes exhausted

I have already tried to set ini_set('memory_limit', '-1'); in order to see if I get the result, but I didn't get any type of response. Postman crashed all the times that I tried to make the POST request.

The starting structure of the array is this one. The body size is around 25mb. The main array contains around 22k arrays with this structure, I have just included 2:

Array
(
    [0] => Array
        (
            [id] => 14
            [isActive] => 1
            [personId] => 0023fff16d353d16a052a267811af53bc8bd42f51f2266a2904ca41db19dfd32_0
            [gender] => m
            [age] => 69
            [linedata_0] => 2018-03-01 17:15:18, 155.59, 294.076; 2018-03-01 17:16:04, 502.968, 249.947; 2018-03-01 17:16:44, 276.837, 270.593; 2018-03-01 17:17:28, 431.68, 371.14; 2018-03-01 17:17:34, 851.622, 355.915
        )

    [1] => Array
        (
            [id] => 180
            [isActive] => 1
            [personId] => 02659982ae8286409cc5bb283089871b62f2bafbbad517941d64e77ecf2b62b1_0
            [gender] => m
            [age] => 69
            [linedata_0] => 2018-03-01 13:20:05, 155.599, 293.841; 2018-03-01 13:20:48, 495.468, 249.582; 2018-03-01 13:21:28, 258.791, 260.748; 2018-03-01 13:23:20, 859.061, 352.237; 2018-03-01 13:23:32, 56.1404, 269.858
        )
)

Here below the php part for manipulate the array in order to have the expected final result by explode the timestamp and coordinates for each user:

$final_result = [];

foreach($query_result as $row)
{
    $line_datas =explode(";",$row["linedata_0"]);
    $linedata = [];
    $final = [];
    $d = [];

    for($s =0; $s < count($line_datas); $s++){
        $line_data = explode(",",$line_datas[$s]);
        $d["timestamp"] = utf8_encode($line_data[0]);
        $d["x"]= utf8_encode($line_data[1]);
        $d["y"] = utf8_encode($line_data[2]);

        array_push($linedata,$d);
    }

    $final["id"]= $row["id"];
    $final["isActive"]= $row["isActive"];
    $final["personId"]= utf8_encode($row["personId"]);
    $final["name"] = NULL;
    $final["gender"] = utf8_encode($row["gender"]);
    $final["age"] = utf8_encode($row["age"]);
    $final["linedata"]=$linedata;

    array_push($final_result, $final);
}

return $final_result;

As it seems to me there are no infinite loop or bad practices that can justify a memory issue. The only real problem could be the size of the array that need to be manipulated.

Any suggestions?

JazZ
  • 4,469
  • 2
  • 20
  • 40
UgoL
  • 839
  • 2
  • 13
  • 37
  • 1
    The first optimization step I would do is to use the $result array returned from the database and not build another monster array. Also any work needs to be done on the data, let the database do it and get the final result data ready to be used, to reduce the work inside the PHP layer. – Accountant م Nov 22 '18 at 20:44
  • @Accountantم the main problem is that I already made a monster aggregation in SQL for get that particular result. I don't think that I can get the final expected result directly with MySQL. That's why I thought about make some parts of the transformation within PHP. – UgoL Nov 22 '18 at 20:48
  • Try and set a limit on the DB query to only fetch a few rows and see if the result is the same. If it is, then you must have some infinite loop somewhere in the code you haven't posted. – M. Eriksson Nov 22 '18 at 20:54
  • @MagnusEriksson already checked. With the LIMIT it works properly. – UgoL Nov 22 '18 at 20:55
  • I can imagine that postman crashes if you're trying to return that much data in one go. Use some pagination instead of returning it all at once. – M. Eriksson Nov 22 '18 at 20:58
  • @MagnusEriksson but the result is not like a transaction list in which I can put an offset for pagination. Basically I should need this data all together. – UgoL Nov 22 '18 at 21:02
  • As long as you're fetching data from a database, I don't see why you couldn't have an offset? Then it's up to the client to fetch the first batch, then the second etc and merge the result on their end. – M. Eriksson Nov 22 '18 at 21:04
  • Also note that the SQL query also may be need to get optimized. – Accountant م Nov 22 '18 at 21:05
  • @Accountantم i think that the query is optimized well, in fact if I `var_dump` without LIMIT i can get the full response in around 10 seconds. And we are talking about a huge response. The only problem is the fact that I'm not able to reproduce with MySQL some array transformations on the data like in PHP. – UgoL Nov 22 '18 at 21:13
  • I suggest you rethink why you are doing this in the first place, if this is what you really need and and you have no other way to do it, but dealing with very huge array, then why not using a buffer ? build a limited array in the memory (10K elements or less) and save them to disk and rebuild the array again, by the end you will find your final data on the disk. I used this on a situation like yours and got great results regarding the memory usage, if you want I can show you an example in an answer. – Accountant م Nov 22 '18 at 21:14
  • @Accountantم Well, actually I just thought a lots about how to structure better the concept on the database side, but I didn't find a better solution that this one: to group_concat timestamp+x+y by each user and get of course a very long string with all the possible data for each user that I need to explode in php. – UgoL Nov 22 '18 at 21:18
  • @Accountantم if you can show me a buffer example according to my current code logic it would be great. Or if you have other ideas to rethink the concept please share it. – UgoL Nov 22 '18 at 21:23
  • The `linedata_0`. Why not split the data in this field into more fields like `age`, `name`, `gender`, `timestamp` , ... this way you will not need to `group_concat` the data in linedata_0 at the database layer, and again split them at the PHP layer. Offcurse I suppose the data in linedata_0 are stored in separate fields in your database. – Accountant م Nov 22 '18 at 21:38
  • OK I will prepare an example on how to use a buffer in your code and post it in an answer – Accountant م Nov 22 '18 at 21:38
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/184110/discussion-between-ugol-and-accountant-). – UgoL Nov 22 '18 at 21:41

3 Answers3

3

You are collecting a large amount of data into the array, and only then returning it.

If you instead, collect a single '$final' item, and yield it inside the foreach-loop, rather than putting it into an ever-increasing sized variable, you will still be able to foreach around the function call.

Here is a simplistic example, where $i stands in as a sample returning value instead of your '$final' array of collected data.

<?php
function count_one_to_three() {
    for ($i = 1; $i <= 3; $i++) {
        // Note that $i is preserved between yields.
        yield $i;
    }
}

$generator = count_one_to_three();
foreach ($generator as $value) {  // you can also foreach(count_one_to_three() as $value)
    echo "$value\n";
}

Information on 'yield' in PHP

Alister Bulman
  • 34,482
  • 9
  • 71
  • 110
  • Thank you for the suggestion. Honestly I never hear about `yield`. I'm not sure how I could correctly implement it in my current code. Is this a better procedure than batches approach in theory? – UgoL Nov 22 '18 at 22:19
  • It's pretty much as simple as replacing the `array_push($final_result, $final);` with `yield $final;`, and not needing to `return` anything. Instead, it is returning one item to the caller per loop, and then continuing from where it left off . The link has more details. It's worth reading up on - http://php.net/manual/en/language.generators.syntax.php – Alister Bulman Nov 22 '18 at 22:37
  • I have read it. But honestly I didn't get the point about how it could be useful for my case... can you show me an example according to my current code? Maybe explain what should be the advantage? – UgoL Nov 23 '18 at 00:30
1

It's bad practice to work with big datas in this case.

Imagine this thing: U have variable $a which contains 22k arrays, and u started forming second variable $b which will contain 22k arrays too.

So at the end of your script u will have 2 variables with 22k arrays.

To escape these problems you should get your data by batches. For example 500 rows at one loop.


    function findRows($offset = 0, &$final_result = []) {
        $sql = 'SELECT * FROM my_table LIMIT ' . $offset . ', 500';
        //your code to find rows

        if ($query_result) {
            $offset = $offset + 500;

            foreach($query_result as $row) {
                //your another code
                array_push($final_result, $final);
            }
            findRows($offset, $final_result);
        }

        return $final_result;
    }

    return findRows();
  • Can you show me an example of getting data by batches based on my current code? – UgoL Nov 22 '18 at 20:58
  • I have applied your solution with the offset of 0,500. But the memory size error still occurs. – UgoL Nov 22 '18 at 23:10
  • And also the `$final_result` is replaced every time. So with your code we'll get null as final result. So an empty array. – UgoL Nov 23 '18 at 00:33
  • Okay, try this. Because i didnt check the code before. This works well – user10685556 Nov 23 '18 at 07:21
  • I'm not sure about it.. but I think that it cannot solves the memory issue. We are just splitting the operation but the cost as memory used is the same. Did you test this solution as per memory consume? – UgoL Nov 23 '18 at 08:36
1

This answer is an example of how to implement a buffer(a limited array in memory) in your code and when it is filled, flush it's contents to disk, at the end you will find a huge array on disk in JSON format. I used this way in a situation similar to yours and got great result regarding "memory usage", but as I told you in comments you need to rethink why you need that HUGE array in the first place, and if there is a way to avoid it, go with it.

using this function will save you the memory used by your $final_result array and replace it with $final_result string buffer but we are controlling it's use of memory. However your $query_result array will still taking the memory it needs.

Note that you need to alter the function as you need because I used your variables which are undefined in my code.

/**
 * proccess the HUGE array and save it to disk in json format [element,element]
 * 
 * @param string $fileName absulote file name path you want to save the proccessed array in
 * @return int processed elements count
 */
function buildMyArrayInFile($fileName)
{
    $flushCheckPoint = 100;// set the buffer size as needed, depending on the size of PHP allowed memory and average element size
    $processedElements = 0;
    $final_result = "[";

    file_put_contents($fileName, "");//prepare the file and erase anything in it

    foreach($query_result as $row)
    {
        $line_datas =explode(";",$row["linedata_0"]);
        $linedata = [];
        $final = [];
        $d = [];

        for($s =0; $s < count($line_datas); $s++){
            $line_data = explode(",",$line_datas[$s]);
            $d["timestamp"] = utf8_encode($line_data[0]);
            $d["x"]= utf8_encode($line_data[1]);
            $d["y"] = utf8_encode($line_data[2]);

            array_push($linedata,$d);
        }

        $final["id"]= $row["id"];
        $final["isActive"]= $row["isActive"];
        $final["personId"]= utf8_encode($row["personId"]);
        $final["name"] = NULL;
        $final["gender"] = utf8_encode($row["gender"]);
        $final["age"] = utf8_encode($row["age"]);
        $final["linedata"]=$linedata;


        $final_result .= json_encode($final) . ",";
        $processedElements ++;
        if($processedElements % $flushCheckPoint === 0){
            //the array has reached the limit, flush the array to disk
            file_put_contents($fileName, $final_result, FILE_APPEND);
            $final_result = "";
        }

    }

    $final_result = rtrim($final_result, ",");//trim the last comma
    $final_result .= "]";
    //flush the remaning data in $final_result
    file_put_contents($fileName, $final_result, FILE_APPEND);

    return $processedElements;

}

this is another simple version of the function for testing

// test
var_dump(buildMyArrayInFile2("/home/myuser/myArray.json"));
// outputs int(7)



function buildMyArrayInFile2($fileName)
{
    $flushCheckPoint = 2;// set the buffer size as needed, depending on the size of PHP allowed memory and average element size
    $processedElements = 0;
    $final_result = "[";

    file_put_contents($fileName, "");//prepare the file and erase anything in it

    $test_result = [1,2,3,4,"wee","hello\nworld",5];
    foreach($test_result as $row)
    {
        $final_result .= json_encode($row) . ",";
        $processedElements ++;
        if($processedElements % $flushCheckPoint === 0){
            //the array has reached the limit, flush the array to disk
            file_put_contents($fileName, $final_result, FILE_APPEND);
            $final_result = "";
        }

    }

    $final_result = rtrim($final_result, ",");//trim the last comma
    $final_result .= "]";
    //flush the remaning data in $final_result
    file_put_contents($fileName, $final_result, FILE_APPEND);

    return $processedElements;
}
Accountant م
  • 6,975
  • 3
  • 41
  • 61
  • Thank you. Give me some time to look at it. – UgoL Nov 22 '18 at 23:29
  • 1
    Finally I have tested all the hypothesis, including the yield solution. But this one was the only solution that worked in my case. – UgoL Nov 28 '18 at 15:50