5

Test script

$i = 0;
array_uintersect(['foo', 'bar'], ['baz', 'qux'], function($a, $b) use (&$i) {
    print_r([$a, $b, $i++]);
});

Actual Result

Array
(
    [0] => bar
    [1] => foo
    [2] => 0
)
Array
(
    [0] => qux
    [1] => baz
    [2] => 1
)
Array
(
    [0] => bar
    [1] => qux
    [2] => 2
)
Array
(
    [0] => bar
    [1] => foo
    [2] => 3
)

Expected Result

Array
(
    [0] => foo
    [1] => baz
    [2] => 0
)
Array
(
    [0] => bar
    [1] => qux
    [2] => 1
)

In other words, what I am expecting to be passed to the callback is the current element of the left array, and the current element of the right array.

Furthermore, I would expect the same logic to apply if I were to pass an additional array to array_uintersect - one more argument being passed to the callback ($c, for example).

Can someone explain this behaviour?

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
The Onin
  • 5,068
  • 2
  • 38
  • 55
  • I don't understand your use of `$i` here. From the docs: "The comparison function must return an integer less than, equal to, or greater than zero if the first argument is considered to be respectively less than, equal to, or greater than the second." – mister martin Oct 27 '16 at 15:52
  • @mistermartin I am using it for debugging purposes; a way to keep track on how much times does the iteration happen. – The Onin Oct 27 '16 at 15:53
  • Why don't you just loop through the first array and use the same index to get the value from the second array? – Sander Visser Oct 27 '16 at 15:54
  • @SanderVisser Sure, but I want to use this function so any developer that picks up on my work in the future immediately knows what's going on. Intersection is self-explanatory. – The Onin Oct 27 '16 at 15:55
  • Yes but the `array_uintersect` tries to intersect the values not the key http://php.net/manual/en/function.array-intersect-key.php – Sander Visser Oct 27 '16 at 16:07
  • PHP inconsistency in function naming but found it ;) php.net/manual/en/function.array-intersect-ukey.php But that isn't what you want. See my answer with array_map – Sander Visser Oct 27 '16 at 16:09
  • So, you didn't actually show or ask what you are trying to do with an actual arrays, just a question about why `array_uintersect()` compares internally in such a way. Not why it does a certain thing with your input arrays that you want. Do you have actual input and output arrays and need help? – AbraCadaver Oct 28 '16 at 03:45
  • @AbraCadaver I'm only interested in the behaviour of the function – The Onin Oct 28 '16 at 08:25

5 Answers5

6

What's not mentioned in the array_uintersect docs is that, internally, PHP sorts all the arrays first, left to right. Only after the arrays are sorted does PHP walk them (again, left to right) to find the intersection.

The third argument (the comparison function) is passed to the internal sort algorithm, not the intersecting algorithm. Thus, the debugging output seen is the sorting algorithm figuring out the ordering.

The zend_sort implementation generally uses a bisecting quick sort implementation. For arrays of the size in your example, PHP uses insertion sort. For large arrays, PHP uses a 3 or 5 point pivot so as to improve worst-case complexity.

Since you're not explicitly returning any value from the comparison function, PHP defaults to returning null (0), and since PHP is using insertion sort, you're seeing O(n*n) behavior as the sort walks all the combinations.

bishop
  • 37,830
  • 11
  • 104
  • 139
  • 1
    Great answer, it's the one I was waiting for. Other people were trying to be helpful and score some reputation by providing workarounds, but my question was *why* is it behaving that way. I went through the C code you've linked, and to me, it seems that, internally, `array_uintersect` (`array_intersect` in general) is analogous with manually iterating over the array with `foreach` and checking if needle element exists in haystack array with `in_array()`, and the main difference being that the C code sorts it before iterating, thus increasing the lookup speed. – The Onin Oct 29 '16 at 17:30
4

I have no idea why do you expect anything from the comparison callback, except comparing the values of the arrays. The sole purpose of the callback is to compare the next pair of items from both arrays.

The function returns the result of intersection of the two arrays. In the callback you express your idea of how the values are supposed to be compared. For example, the following code assumes that the intersection should be performed by comparing the first characters of the strings:

$a = array_uintersect(['foo', 'bar'], ['baz', 'qux'], function($a, $b) {
  return strcmp($a[0], $b[0]);
});

print_r($a);

Output

Array
(
    [1] => bar
)

The order of the items passed to the callback is specified by the PHP internals, and may easily change in future.

So the comparison function is not supposed to do anything, except comparing two variables. There is not even a hint of use of the callback for any other purpose in the official documentation.

Ruslan Osmanov
  • 20,486
  • 7
  • 46
  • 60
2

I believe the first two calls are being used to seed variables in the internal algorithm. But since you don't return anything that the algorithm can use to determine equality/sorting, it only runs the next two.

If you actually return 0, 1 or -1 then you see the full comparison chain that is needed to calculate the intersection:

$i = 0;
array_uintersect(['foo', 'bar'], ['baz', 'qux'], function($a, $b) use (&$i) {
    print_r([$a, $b, $i++]);

    if ($a === $b) return 0;
    if ($a  >  $b) return 1;
    return -1;
});

Yields:

Array
(
    [0] => bar
    [1] => foo
    [2] => 0
)
Array
(
    [0] => qux
    [1] => baz
    [2] => 1
)
Array
(
    [0] => bar
    [1] => baz
    [2] => 2
)
Array
(
    [0] => foo
    [1] => baz
    [2] => 3
)
Array
(
    [0] => foo
    [1] => baz
    [2] => 4
)
Array
(
    [0] => foo
    [1] => qux
    [2] => 5
)
AbraCadaver
  • 78,200
  • 7
  • 66
  • 87
0

I think you are looking for this ;)

$result = array_map(function($a, $b) {
    return [$a, $b];
}, ['foo', 'bar'], ['baz', 'qux']);
var_dump($result);

This will output

array(2) {
  [0]=>
  array(2) {
    [0]=>
    string(3) "foo"
    [1]=>
    string(3) "baz"
  }
  [1]=>
  array(2) {
    [0]=>
    string(3) "bar"
    [1]=>
    string(3) "qux"
  }
}

Update: It returns the result you want with the array_uintersect method. It isn't the most efficient way to do this and didn't test it with different data sets etc but should work.

$entities = [
    [
        'id' => 1,
        'timestamp' => 1234
    ],
    [
        'id' => 2,
        'timestamp' => 12345
    ],
    [
        'id' => 3,
        'timestamp' => 123456
    ],
    [
        'id' => 8,
        'timestamp' => 123456
    ],
    [
        'id' => 10,
        'timestamp' => 123456
    ],
    [
        'id' => 11,
        'timestamp' => 123456
    ],
    [
        'id' => 12,
        'timestamp' => 123456
    ]
];

$identities = [1, 11, 2, 8, 10];

$result = array_uintersect($entities, $identities, function($a, $b) {

    // Both array skip
    if (is_array($a) && is_array($b)) {
        if ($a['id'] > $b['id']) {
            return 1;
        }
        return -1;
    }

    // Both int skip
    if (is_int($a) && is_int($b)) {
        if ($a > $b) {
            return 1;
        }
        return -1;
    }

    // $a is array
    if (is_array($a)) {
        if ($a['id'] == $b) {
            return 0;
        }
        elseif ($a['id'] > $b) {
            return 1;
        }
        return -1;
    }

    // $b is array
    if($b['id'] == $a) {
        return 0;
    }
    if($a > $b['id']) {
        return 1;
    }

    return -1;
});
var_dump($result);

and the result

array(5) {
  [0]=>
  array(2) {
    ["id"]=>
    int(1)
    ["timestamp"]=>
    int(1234)
  }
  [1]=>
  array(2) {
    ["id"]=>
    int(2)
    ["timestamp"]=>
    int(12345)
  }
  [3]=>
  array(2) {
    ["id"]=>
    int(8)
    ["timestamp"]=>
    int(123456)
  }
  [4]=>
  array(2) {
    ["id"]=>
    int(10)
    ["timestamp"]=>
    int(123456)
  }
  [5]=>
  array(2) {
    ["id"]=>
    int(11)
    ["timestamp"]=>
    int(123456)
  }
}
Sander Visser
  • 4,144
  • 1
  • 31
  • 42
  • That is certainly what I was trying to achieve, but I wanted to use `array_uintersect` because of the intersect word in the function name (want to make it as clear as possible what I'm doing). – The Onin Oct 27 '16 at 16:13
  • From the docs `array_uintersect — Computes the intersection of arrays, compares data by a callback function` So I think `array_uintersect` is less clear because of the documentation your not comparing "data". `array_intersect_ukey` does what you want but doesn't return the result that you want. it doesn't combine the values. – Sander Visser Oct 27 '16 at 16:15
  • I do want to compare values, not keys. Specifically, I have a multi dimensional array with two subelements (id and timestamp) on every element, and a single dimensional array (only id), and my goal is to discard all the entries from the multi dimensional array that doesn't contain elements from the single dimensional array. – The Onin Oct 27 '16 at 16:18
  • @NinoŠkopac So you want to compare values, or just rearrange arrays in manner of this answer? Why you trying to perform intersect, when your arrays intersection are empty? You're printing some hidden details of algorithm implementation, not the function result – Max Zuber Oct 27 '16 at 16:22
  • Ok I think i understand now, so you have an array with objects that have an `id` and some other properties and you have a array with `ids` and you want to filter all objects that aren't defined in the array that contains the `ids` – Sander Visser Oct 27 '16 at 16:24
  • That's it. and I know there are million ways to do this, but I wanted to use intersection cause of the function name. I mean, intersection literally means keeping only elements that share a property. – The Onin Oct 27 '16 at 16:30
  • I did it! not sure if stable :P see updated answer. Note that this isn't the most efficient way to do it. but it's with the intersection method ;) – Sander Visser Oct 27 '16 at 16:44
  • Needs more improvements because where `-1` is returned is incorrectly (should return 1 or -1) – Sander Visser Oct 27 '16 at 16:53
  • And updated again with the correct returns ;) should be pretty stable now – Sander Visser Oct 27 '16 at 16:58
-4
<?php
    $i  = 0;
    $r1 = ['foo', 'bar'];
    $r2 = ['baz', 'qux'];
    $result = array_uintersect($r1, $r2, function($a, $b){
        return ($a[0]> $b[0]);
    });


    var_dump($result);
    // YIELDS::
    array (size=2)
      0 => string 'foo' (length=3)
      1 => string 'bar' (length=3)
Poiz
  • 7,611
  • 2
  • 15
  • 17