0

Consider this collection below:

$collection = [
    [1 => 10.0, 2 => 20.0, 3 => 50.0, 4 => 80.0, 5 => 100.0],
    [3 => 20.0, 5 => 20.0, 6 => 100.0, 7 => 10.0],
    [1 => 30.0, 3 => 30.0, 5 => 10.0, 8 => 10.0]
];

Consider this theorical output based on the intersection of the Arrays contained into $collection, considering their array keys with respective values based on the average of the single values:

$output = Array ( 3 => 33.3333, 5 => 43.3333 );

Can this problem be resolved with a native PHP function like array_intersect_* in an elegant way?

If not, can you suggest me an elegant solution that doesn't necessarily need an outer ugly foreach?

Keep in mind that the number of arrays that need to be intersected is not fixed. It can be 2 input arrays as it can be 1000 input arrays. Keys will be integers at all times, and Values will be floats or integers at all times.

In other words:

$collection = [
    $arr1 = [ ... ];
    $arr2 = [ ... ];
    $arr3 = [ ... ];
    ...
    $arrn = [ ... ];
];
$output = [ intersected and weighted array based (on comparison) on keys from $arr1 to $arrn, and (on values) from the value averages ];
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Maurizio
  • 469
  • 1
  • 4
  • 11
  • 1
    Why would this need _"an unknown number of nested foreach's"_, each array is very much one-dimensional ...? – CBroe Nov 08 '17 at 17:49
  • No, I don't think there's any native PHP function that does this. – Don't Panic Nov 08 '17 at 17:50
  • I'd simply loop over those arrays, and create a new one with the keys as keys, and the value as an array (`['one' => [10, 30], 'two' => [20], ...]`), and then loop over that one again, to boil these sub-arrays down to the average ... and probably be done with this, while you're still looking for a more sophisticated/fancy way using build-in array functions ...? Ok, I'll accept replacing the second foreach loop with `array_map`, that just makes sense at this point. But I would not try and find a way to cram this all into "array functions" just for the sake of it. – CBroe Nov 08 '17 at 17:54
  • 1
    Wait... What? Unknown number of arrays? How does that happen? – Andreas Nov 08 '17 at 17:55
  • Ok, `array_merge_recursive` works fine here to replace my first foreach loop as well, as Andreas' answer shows. – CBroe Nov 08 '17 at 17:55
  • 1
    Not really. *Unknown number of arrays.* However that can happen... Arrays don't generally just "happen". – Andreas Nov 08 '17 at 17:56
  • @CBroe yes, the input arrays are always one-dimensional – Maurizio Nov 08 '17 at 18:02
  • @Andreas I've not put it correctly: we know how much arrays will be compared, but we just can't know for sure if they will be 4 or 10 or 100 before we write down the intersection function – Maurizio Nov 08 '17 at 18:05
  • 2
    Where are these arrays coming from then? Can you at least put _them_ into an array to begin with, so that you can loop over them easily? (Because with an unknown number of input arrays, you'll likely rather come back to the foreach loop, feeding that into array_merge_recursive might be more difficult.) Otherwise, you would have to loop using variable variables to work with your "numbered" array names, and that's just plain ugly. – CBroe Nov 08 '17 at 18:12
  • @CBroe yes, the input arrays are contained into another array. That's why we don't know "a priori" how many input arrays we'll compare. P.S: I've changed the question a little to better explain myself – Maurizio Nov 08 '17 at 18:16
  • As I said, that would be an argument pro foreach then, because with array_merge_recursive i wouldn't know how to feed in an unknown number of arrays easily. – CBroe Nov 08 '17 at 18:20
  • 1
    @Maurizio If the input arrays are contained in another array, I would strongly recommend adjusting your example code to reflect that. – Don't Panic Nov 08 '17 at 18:24
  • Your edit has an array with variables inside it. As far as I know that can't be done. – Andreas Nov 08 '17 at 18:27
  • @Andreas yes sorry for that, I've originally put the question under a wrong approach, in practical terms the input arrays are contained into a collection, that's why the number of input arrays is not known. – Maurizio Nov 08 '17 at 18:30
  • @Don'tPanic fixed the question as you suggested! – Maurizio Nov 08 '17 at 18:31
  • 2
    Your example output still doesn't match the input, key `1` occurs twice, therefor it's average 20 should be in the result as well. (For someone still asking for it to be "beautified a little bit" under an existing solution, you're quite sloppy with how you're asking ... ;-) – CBroe Nov 08 '17 at 18:46
  • 1
    @CBroe I think "one" should not be in output since it is not in all three arrays. Intersect only returns the keys that is in all arrays inputed in function. – Andreas Nov 08 '17 at 18:53
  • @Andreas ah, I see, couldn't really tell that from the example. Added a second edit to my answer, because in that case the filter criterion must of course be whether the value-array for a key contains as much entries as there were input arrays to begin with. – CBroe Nov 08 '17 at 19:04
  • 1
    All other aside. This has been a fun question to try and solve. I like challenging questions that I have to think for a while to get right. – Andreas Nov 08 '17 at 19:19

5 Answers5

2

Count the input array once.

$n = count($collection);

Compute the intersection of all the sub-arrays by key.

$intersection = array_intersect_key(...$collection);
// PHP5: $intersection = call_user_func_array('array_intersect_key', $input);

Build your result by averaging the column from the input array for each key from the intersection.

$output = [];
foreach ($intersection as $key => $value) {
    $output[$key] = array_sum(array_column($collection, $key)) / $n;
}

If you really want to completely avoid foreach you can use array_map instead.

$output = array_map(function($key) use ($collection, $n) {
    return array_sum(array_column($collection, $key)) / $n;
}, array_keys($intersection));

But in my opinion, this just adds unnecessary complexity.


Note: The values in $intersection will be single values from the first sub-array, but they don't really matter; they're disregarded when generating the output. If it bothers you to have a useless $value variable in the foreach, then you can do foreach (array_keys($intersection) as $key) instead, but I opted for avoiding an unnecessary function call.

Don't Panic
  • 41,125
  • 10
  • 61
  • 80
1

You can merge the arrays to one and use array_sum and count() to get the average.

$arr1 = Array ( 'one' => 10, 'two' => 20, 'three' => 50, 'four' => 80, 'five' => 100 );
$arr2 = Array ( 'three' => 20, 'five' => 20, 'six' => 100, 'seven' => 10 );
$arr3 = Array ( 'one' => 30, 'three' => 30, 'five' => 10, 'eight' => 10 );
$array = array_merge_recursive($arr1,$arr2,$arr3);

$key= "two";
If(is_array($array[$key])){
    $avg = array_sum($array[$key])/count($array[$key]);
}Else{
    $avg = $array[$key];
}

Echo $avg;

https://3v4l.org/pa3PH


Edit to follow $collection array.

Try this then. Use array column to grab the correct key and use array_sum and count to get the average.

$collection = array(
    Array ( 'one' => 10, 'two' => 20, 'three' => 50, 'four' => 80, 'five' => 100 ),
    Array ( 'three' => 20, 'five' => 20, 'six' => 100, 'seven' => 10 ),
    Array ( 'one' => 30, 'three' => 30, 'five' => 10, 'eight' => 10 ));

$key= "three";
$array = array_column($collection, $key);

If(count($array) != 1){
    $avg = array_sum($array)/count($array);
}Else{
    $avg = $array[0];
}

Echo $avg;

https://3v4l.org/QPsiS


Final edit.

Here I loop through the first subarray and use array column to find all the matching keys.
If the count of keys is the same as the count of collection the key exsists in all subarrays and should be "saved".

$collection = array(
    Array ( 'one' => 10, 'two' => 20, 'three' => 50, 'four' => 80, 'five' => 100 ),
    Array ( 'three' => 20, 'five' => 20, 'six' => 100, 'seven' => 10 ),
    Array ( 'one' => 30, 'three' => 30, 'five' => 10, 'eight' => 10 ));

Foreach($collection[0] as $key => $val){
    $array = array_column($collection, $key);
    If(count($array) == count($collection)){
        $avg[$key] = array_sum($array)/count($array);
    }
}
Var_dump($avg);

https://3v4l.org/LfktH

Andreas
  • 23,610
  • 6
  • 30
  • 62
  • This would complain about array_sum being fed an integer though, if you used it for the `two` key. I guess in that regard I'll still prefer my foreach loop over array_merge_recursive here, because that would have made `two` an array containing the single value 20 to begin with. – CBroe Nov 08 '17 at 17:59
  • But that could also easily be solved by checking whether it's an array first, and then either return sum/count, or the value itself. Then this could also be applied via array_map, and we'd get out an array with the average value for each key that occurred at least once. – CBroe Nov 08 '17 at 18:02
  • @Andreas $arr1, $arr2, $arr3 are not verbosely defined, but are contained into another Array of unknown dimension. So input arrays will be deduced from a $collection = [ $arr1, $arr2, $arr3, .... ]; – Maurizio Nov 08 '17 at 18:18
  • @Maurizio see my last edit. I make one loop to get the averages of matching subarray keys. – Andreas Nov 08 '17 at 19:10
1

I guess it could be done like this:

<?php

$intersecting_arrays = Array (
    0 => Array ( 'one' => 10, 'two' => 20, 'three' => 50, 'four' => 80, 'five' => 100 ),
    1 => Array ( 'three' => 20, 'five' => 20, 'six' => 100, 'seven' => 10 ),
    2 => Array ( 'one' => 30, 'three' => 30, 'five' => 10, 'eight' => 10 )
    );

$temp = $intersecting_arrays[0];
for($i = 1; $i < count($intersecting_arrays); $i++) {
    $temp = array_intersect_key($temp, $intersecting_arrays[$i]);
}

$result = Array();
foreach(array_keys($temp) as $key => $val) {
    $value = 0;
    foreach($intersecting_arrays as $val1) {
        $value+= $val1[$val];
    }
    $result[$key] = $value / count($intersecting_arrays);
}

print_r($temp);
print_r($result);

https://3v4l.org/j8o75

In this manner it doesn't depend on how much arrays you have. Here you get the intersection of keys in all arrays and then count an average using collected keys.

Roman
  • 473
  • 5
  • 22
  • Works as expected! Thank you! Do you think this can be beautified a little bit? I mean, this function first intersects the keys, then it calculates the weight in a second run. Do you think this can be written in one pass only? – Maurizio Nov 08 '17 at 18:39
  • @Maurizio I think you can cover it in a single function and then call it for an input array. What do you need "one pass only" regime for? (: – Roman Nov 08 '17 at 18:46
  • 2
    @Maurizio, no matter which way you turn this and look at it, in some form or other this will need looping over the input data twice, because to calculate the average you first of all need to know _how many_ values you are calculating the average _of_, so doing this "all in one go" is rather impossible I think. – CBroe Nov 08 '17 at 18:49
1

Ok, with an unknown number of input arrays, I would definitively go with two nested foreach loops to combine them first - getting an unknown number into array_merge_recursive or similar is going to be difficult.

$input = [
  0 => [ 'one' => 10, 'two' => 20, 'three' => 50, 'four' => 80, 'five' => 100],
  1 => [ 'three' => 20, 'five' => 20, 'six' => 100, 'seven' => 10],
  2 => [ 'one' => 30, 'three' => 30, 'five' => 10, 'eight' => 10]
];

$combined = [];
foreach($input as $array) {
  foreach($array as $key => $value) {
    $combined[$key][] = $value;
  }
}

$averages = array_map(function($item) {
  return array_sum($item)/count($item);
}, $combined);

var_dump($averages);

https://3v4l.org/hmtj5

Note that this solution doesn't need to check for array vs single integer in the array_map callback, because unlike array_merge_recursive, $combined[$key][] inside the loops sees to it that even the keys with just one value will have that value in an array.


EDIT:

but keep in mind that not all the keys are going to be taken into account

Ah, ok, so you want averages only for those keys that occurred more than once. That can easily be fixed by filtering the combined array before using array_map on it:

$combined = array_filter($combined, function($v, $k) {
  return count($v) != 1;
}, ARRAY_FILTER_USE_BOTH );

Integrated into above solution: https://3v4l.org/dn5ro


EDIT #2

[Andreas' comment] I think "one" should not be in output since it is not in all three arrays.

Ah, I see ... couldn't tell that was the actually desired result even from the example :-) Then my filtering has to be modified a little bit again, and take the number of input arrays into account:

$combined = array_filter($combined, function($v, $k) use($input) {
  return count($v) == count($input);
}, ARRAY_FILTER_USE_BOTH );

https://3v4l.org/9H086

CBroe
  • 91,630
  • 14
  • 92
  • 150
  • thank you for that, but keep in mind that not all the keys are going to be taken into account because the only keys that are present in *all* the arrays are "three" and "five". We also need to do some sort of "array_intersect" based on the keys. – Maurizio Nov 08 '17 at 18:35
  • @Maurizio ah ok, that wasn't really clear from the beginning. See my edit, please - easily fixable, if we filter the combined array first, throwing away all keys that have only one value in their array. (I am gonna assume that for key `one` you want the average, too, since that occurs more than once as well.) – CBroe Nov 08 '17 at 18:43
  • Perfect, works as expected! See what I've asked to Roman below, it applies also to your solution: do you think the same can be achieved in a more elegant way? In other words, both solutions use a very similar modus operandi: first intersect keys, then calculate averages. What do you think? There's a big lack in PHP native functions on intersects, because they enable to do callbacks on the comparisons but not on the associations – Maurizio Nov 08 '17 at 18:49
  • @Maurizio as I commented under Roman's answer as well, I don't think you can avoid effectively looping over the data twice, because to calculate the average you need to know how many operands there are, and you _don't_ know that while you are still "collecting" the values. – CBroe Nov 08 '17 at 18:50
  • @Maurizio second edit, regarding what Andreas mentioned in comments, that you seem to not want `one`/`1` in the result, because that does not occur in _all_ input arrays. Filtering function needs a little update then in that regard. – CBroe Nov 08 '17 at 19:05
1

Can this problem be resolved with a native PHP function like array_intersect_* in an elegant way?

Well, elegance is in the eye of the developer. If functional-style programming with no new globally-scoped variables equals elegance, then I have something tasty for you. Can a native array_intersect_*() call be leveraged in this task? You bet!

There's a big lack in PHP native functions on intersects - @Maurizio

I disagree. PHP has a broad suite of powerful, optimized, native array_intersect*() and array_diff*() functions. I believe that too few developers are well-acquainted with them all. I've even build a comprehensive demonstration of the different array_diff*() functions (which can be easily inverted to array_intersect*() for educational purposes).


Now, onto your task. First, the code, then the explanation.

Code: (Demo)

var_export(
    array_reduce(
        array_keys(
            array_intersect_ukey(
                ...array_merge($collection, [fn($a, $b) => $a <=> $b])
            )
        ),
        fn($result, $k) => $result + [$k => array_sum(array_column($collection, $k)) / count($collection)],
        []
    )
);
  1. The first subtask is to isolate the keys which are present in every row. array_intersect_ukey() is very likely the best qualified tool. The easy part is the custom function -- just write the two parameters with the spaceship in between. The hard part is setting up the variable number of leading input parameters followed by the closure. For this, temporarily merge the closure as an array element onto the collection variable, then spread the parameters into the the native function.
  2. The payload produced by #1 is an array consisting of the associative elements from the first row where the keys were represented in all rows ([3 => 50.0, 5 => 100.0]). To prepare the data for the next step, the keys must be converted to values -- array_keys() is ideal because the float value are of no further use.
  3. Although there is an equal number of elements going into and returning in the final "averaging step", the final result must be a flat associative array -- so array_map() will not suffice. Instead, array_reduce() is better suited. With the collection variable accessible thanks to PHP7.4's arrow function syntax, array_column() can isolate the full column of data then the averaging result pushed as an associative element into the result array.
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • 1
    this is definitely the answer I was looking for, back in the day, I would have greatly benefited from it. Thank you for your full detailed explanations and for the time spent on this task. P.S. after 5+ years of developing, I also disagree on the _There's a big lack in PHP native functions on intersects_ :)) – Maurizio Jan 05 '23 at 18:17