3

I am looking for a way to create new arrays in a loop. Not the values, but the array variables. So far, it looks like it's impossible or complicated, or maybe I just haven't found the right way to do it.

For example, I have a dynamic amount of values I need to append to arrays.
Let's say it will be 200 000 values. I cannot assign all of these values to one array, for memory reasons on server, just skip this part.
I can assign a maximum amount of 50 000 values per one array.
This means, I will need to create 4 arrays to fit all the values in different arrays. But next time, I will not know how many values I need to process.

Is there a way to generate a required amount of arrays based on fixed capacity of each array and an amount of values?
Or an array must be declared manually and there is no workaround?

What I am trying to achieve is this:

$required_number_of_arrays = ceil(count($data)/50000);

for ($i = 1;$i <= $required_number_of_arrays;$i++) {

 $new_array$i = array();

 foreach ($data as $val) {
  $new_array$i[] = $val;
 }

}

// Created arrays: $new_array1, $new_array2, $new_array3
encrypted21
  • 194
  • 11
  • You want something like http://php.net/manual/en/function.array-fill.php ? edit: wait, can you tell us more about what's in `$data` ? – Scuzzy Mar 07 '18 at 21:55
  • 2
    You may need to start using http://php.net/manual/en/language.generators.overview.php to overcome memory limitations (_>= 5.5.0_). – Scuzzy Mar 07 '18 at 21:59
  • @Scuzzy Just checked it out. It's a function used to fill an array with the same data for a specified amount of times. Not the solution, it has a different usage. – encrypted21 Mar 07 '18 at 22:03
  • @Scuzzy String data from the database. A huge amount of data needed to be processed, and for that I need an array. No other option. Edit: Generators might be a solution, but it's not exactly what I'm looking for – encrypted21 Mar 07 '18 at 22:04
  • 2
    @encrypted21 You can retrieve large amounts of data from a database using generators, an [example in PDO](https://evertpot.com/switching-to-generators/) – Xorifelse Mar 07 '18 at 22:09
  • 1
    The concept of generators is that when yielding the next value, the previous one is removed from memory. This way you can easily read a 10Gb text file and only hold as much memory of each line of that text file that will unlikely ever breach the maximum allotted memory. – Xorifelse Mar 07 '18 at 22:19
  • Note that we're only seeing a small fraction of your greater problem, so we probably don't appreciate the complexity of what work you're needing to perform on your data set. Is there anything you can tell us about what you're doing with this mass of data that you need to have big working sets? Perhaps there's already a wheel that's been invented for what you need to do. eg a queue system – Scuzzy Mar 07 '18 at 22:19
  • @Scuzzy I fetch data from the database. Then, I assign those data to an array (memory problem). After that, I use the array in DOMDocument to append children nodes and values in XML files. It is the most optimal solution to process the files, but there is the problem with the memory. However, I will try the generators out, because it sounds like a solution. I'm hearing of them for the first time, to be honest. – encrypted21 Mar 07 '18 at 22:27
  • If `$data` is an array, what is `ceil($data/50000)` supposed to be? Did you mean `ceil(count($data)/50000)`? – Barmar Mar 07 '18 at 22:34
  • But if you can't put all the data into a single array, how is it all in `$data`? – Barmar Mar 07 '18 at 22:36
  • 1
    @encrypted21 Keep in mind, the yielded data from generators needs to be processed immediately. Storing that data in another array will negate the effect it is having. – Xorifelse Mar 07 '18 at 22:39
  • @Barmar It is only a sample of what I'm trying to achieve. `$data` aren't important here. It is not a real code. Only to visualize the idea. – encrypted21 Mar 07 '18 at 22:45
  • There's no maximum size of an array: https://stackoverflow.com/questions/6856506/what-is-the-maximum-size-of-an-array-in-php – Barmar Mar 07 '18 at 22:46
  • @Barmar The memory problem isn't related to an array size, but to a server memory. Huge array causes the memory usage to spike, which destroys the performance of the server if the limit is raised. – encrypted21 Mar 07 '18 at 22:48
  • @Xorifelse That shouldn't be a problem, right? I could yield the value, modify the XML file by appending the child, and then repeat until all the values are processed. Correct? – encrypted21 Mar 07 '18 at 22:53
  • 1
    That is exactly what I mean, yes. Keep the file handle open though (open it before and close it after the loop), otherwise you will increase a lot of IO on the (H/S)DD. – Xorifelse Mar 07 '18 at 23:02
  • @Xorifelse I see, thank you! I will try to implement that. I'm creating a DOMDocument. Then, I'm creating nodes in a foreach loop where it appends them into a file with the data from array from the database. At the end, I save the document outside the loop. Hope it will work the same way after switching to generators. – encrypted21 Mar 07 '18 at 23:07
  • 2
    If you have issues building your XML due to memory too, what you can do is have a single document you use as your "scratch" pad for building the XML, then perform a [$doc->saveXML($node)](http://php.net/manual/en/domdocument.savexml.php) using the DOMNode reference to just get that inner XML string and use fwrite append to add those entries to your output file. once finished closing the outer xml element manually. – Scuzzy Mar 07 '18 at 23:14
  • @Scuzzy You're on a roll today. Chapeau! What Scuzzy says is true, leaving unsaved data in a file handle will increase the memory as well. Using his resolution would decrease memory consumption, with increased CPU consumption. – Xorifelse Mar 07 '18 at 23:20
  • @Scuzzy It might be a good thing, but I had some trouble using files/ IO with the XML in my case. I switched to DOMDocument, because I can simply select a node, append children to it, or create new nodes without seeking for lines etc. Mixing DOMDocument and files/IO makes it a little more messy :). Besides, my code already generates XML files, so I just need to solve the array problem to make it work 100%. – encrypted21 Mar 07 '18 at 23:25
  • Great to hear, I love DOMDocument, its a very solid class to play with. also cheers @Xorifelse :) – Scuzzy Mar 07 '18 at 23:27

6 Answers6

1

A possible way to do is to extend ArrayObject. You can build in limitation of how many values may be assigned, this means you need to build a class instead of $new_array$i = array();

However it might be better to look into generators, but Scuzzy beat me to that punchline.

The concept of generators is that with each yield, the previous reference is inaccessible unless you loop over it again. It will be in a way, overwritten unlike in arrays, where you can always traverse over previous indexes using $data[4].

This means you need to process the data directly. Storing the yielded data into a new array will negate its effects.

Fetching huge amounts of data is no issue with generators but one should know the concept of them before using them.

Xorifelse
  • 7,878
  • 1
  • 27
  • 38
  • Hmm, maybe you're right. As @Scuzzy said, it actually could solve the problem without the use of regular arrays. The problem is that those data cause the single array to exceed the memory. I can override this by ini_set, but it's not a solution, since it will just drastically abuse the server. I wanted to process each array, destroy it, and then the loop would give the remaining arrays. But generators sound like the real solution. – encrypted21 Mar 07 '18 at 22:19
  • Let me know how it works out, I usually script\program as efficiently as possible and never had these issues to resolve so I'm curious as well. But I suspect it will resolve the situation obviously. – Xorifelse Mar 07 '18 at 23:41
  • Sure, I'll let you know, probably tomorrow when I'll be coding again :) – encrypted21 Mar 07 '18 at 23:43
  • 1
    I was able to properly test everything only a few days ago, so I'm reporting now. Generators have worked :) – encrypted21 Mar 15 '18 at 10:26
1

Based on your comments, it sounds like you don't need separate array variables. You can reuse the same one. When it gets to the max size, do your processing and reinitialize it:

$max_array_size = 50000;

$n = 1;
$new_array = [];

foreach ($data as $val) {
    $new_array[] = $val;

    if ($max_array_size == $n++) {
        // process $new_array however you need to, then empty it
        $new_array = [];
        $n = 1;
    }
}
if ($new_array) {
    // process the remainder if the  last bit is less than max size
}
Don't Panic
  • 41,125
  • 10
  • 61
  • 80
  • My idea was to generate an array with the first chunk of values. Then, process it in that loop, get rid of that variable to free the memory. And then, let the loop repeat to process the remaining arrays. Since generators have been suggested, I will try them out since they do solve the memory problem – encrypted21 Mar 07 '18 at 22:32
  • Oh, well if that's the case you wouldn't really need separate array variables. you could just reuse the same one. – Don't Panic Mar 07 '18 at 22:35
  • Thanks, I will try this method and decide which works best for my code :) – encrypted21 Mar 07 '18 at 23:32
0

You could create an array and use extract() to get variables from this array:

$required_number_of_arrays = ceil($data/50000);
$new_arrays = array();
for ($i = 1;$i <= $required_number_of_arrays;$i++) {
   $new_arrays["new_array$i"] = $data;
}
extract($new_arrays);

print_r($new_array1);
print_r($new_array2);
//...
Syscall
  • 19,327
  • 10
  • 37
  • 52
  • I'm guessing that this would leave a huge memory imprint on the variable table of PHP. Not just that, all the values still recede in the memory. Not resolving, but only making it worse. But again, I'm guessing looking it it logically. – Xorifelse Mar 07 '18 at 23:06
0

I think in your case you have to create an array that holds all your generated arrays insight.

so first declare a variable before the loop.

$global_array = [];

insight the loop you can generate the name and fill that array.

$global_array["new_array$i"] = $val;

After the loop you can work with that array. But i think in the end that won't fix your memory limit problem. If fill 5 array with 200k entries it should be the same as filling one array of 200k the amount of data is the same. So it's possible that you run in both ways over the memory limit. If you can't define the limit it could be a problem.

ini_set('memory_limit', '-1');

So you can only prevent that problem in processing your values directly without saving something in an array. For example if you run a db query and process the values directly and save only the result.

You can try something like this:

foreach ($data as $key => $val) {
   $new_array$i[] = $val;
   unset($data[$key]);
}

Then your value is stored in a new array and you delete the value of the original data array. After 50k you have to create a new one.

Easier way use array_chunk to split your array into parts.

https://secure.php.net/manual/en/function.array-chunk.php

René Höhle
  • 26,716
  • 22
  • 73
  • 82
  • Thanks for your answer, however, this is exactly the thing, it will not solve the problem. What I wanted to do is to create smaller arrays, process one, destroy it, and let the loop to generate the other ones and so on. It's a bad idea to override memory limit since it will only overload the server's performance. – encrypted21 Mar 07 '18 at 22:17
  • Then run through your $data array save the value in the new array and pop the value out of the main array. I update my answer. – René Höhle Mar 07 '18 at 22:20
  • Won't it just move all the memory exceeding data to the new array? I could move 50000 values to a new array, but it's too complicated to implement. I think generators are a great idea. They process each value and then destroy the old one to keep only one value in memory. – encrypted21 Mar 07 '18 at 22:23
0

There's non need for multiple variables. If you want to process your data in chunks, so that you don't fill up memory, reuse the same variable. The previous contents of the variable will be garbage collected when you reassign it.

$chunk_size = 50000;
$number_of_chunks = ceil($data_size/$chunk_size);
for ($i = 0; $i < $data_size; $i += $chunk_size) {
    $new_array = array();
    foreach ($j = $i * $chunk_size; $j < min($j + chunk_size, $data_size); $j++) {
        $new_array[] = get_data_item($j);
    }
}

$new_array[$i] serves the same purpose as your proposed $new_array$i.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thank you, but it is the same as appending all the data into one single array. The memory usage won't shrink by implementing variable variables. – encrypted21 Mar 07 '18 at 22:29
  • 1
    This is not one single array for all the data. The top-level array is distinct from the arrays that it refers to in its elements. – Barmar Mar 07 '18 at 22:31
  • What is `$data`? You write `ceil($data/50000)`, so it must be a number. But then you write `foreach ($data as $val)` so it must be an array. Which is it? – Barmar Mar 07 '18 at 22:32
  • And that `foreach` loop is no different from writing `$new_array$i = $data;`, it just makes a copy of the array. – Barmar Mar 07 '18 at 22:33
  • I wrote `ceil($data/50000)` wrong. There should be a count variable of data in database. $data are the fetched results in associative array from the database query. What you are suggesting though is variable variables, which only changes how the values are stored. The point of my problem is to create different arrays, not one holding the other ones. I needed to process each array, destroy it, and then generate and process the next one to free up the memory in process. My idea was bested by generators. – encrypted21 Mar 07 '18 at 22:41
  • 1
    If you're going to destroy the array after processing it, why do they need to be in different variables? Use the same variable and just reinitialize it, which will free up the memory from the previous array. – Barmar Mar 07 '18 at 22:43
  • Thanks! I'll go through the solutions and check which one works (best) in my case :) – encrypted21 Mar 07 '18 at 23:19
0

You could do something like this:

$required_number_of_arrays = ceil(count($data)/50000);
for ($i = 1;$i <= $required_number_of_arrays;$i++) {
 $array_name = "new_array_$i";
 $$array_name = [];
 foreach ($data as $val) {
  ${$array_name}[] = $val;
 }
}
José A. Zapata
  • 1,187
  • 1
  • 6
  • 12
  • I don't quite understand how it works or what exactly it does :D – encrypted21 Mar 07 '18 at 23:12
  • It's a "variable variable". Basically, $$array_name takes the value of $array_name and convert is to a variable. So in the first pass, $$array_name will be the equivalent to $new_array_1, in the second pass it'll be equivalent to $new_array_2, and so on. Check http://php.net/manual/en/language.variables.variable.php – José A. Zapata Mar 07 '18 at 23:15
  • I will stick with generators for now and will also try emptying one single array in the process as suggested. Your suggestion looks a little more confusing to me as I'm not very familiar with using double $ etc. But if anything, I will try it too :) Thank you! – encrypted21 Mar 07 '18 at 23:30
  • 1
    [Variable variables](http://php.net/manual/en/language.variables.variable.php), imho something to avoid. Cause quite frankly, you never know which variable is defined (reading the source code) and noor can any IDE help you to figure that out. It also allows for deformed syntax, `$0` is not allowed in PHP, unless defined using this method. Noor will it help you with the memory issue as still every value to `ceil(count($data)/50000)` is stored with the memory of PHP, excluding the rest of the 150k values. – Xorifelse Mar 08 '18 at 00:14