-1

I've spent hours trying to use array_diff, array_unique, writing foreach loops and anything else I can find and I can't get anything work, the result is always wrong.

Say I have these two arrays:

$arr_a = ['mary', 'mary', 'mary', 'jack', 'jack', 'jack', 'jack', 'fred', 'fred'];
$arr_b = ['mary', 'mary', 'jack', 'jack'];

I need returned:

['mary', 'jack', 'jack', 'fred', 'fred'];

I need the 'leftover' values after the matching values are canceled out, along with those that are completely unique. In $arr_a, we have 3 mary and 4 jack and 2 fred. If you subtract the 2 mary and 2 jack from that, you're left with 1 mary, 2 jack and 2 fred.

My actual use case is comparing thousands of product id's to thousands of product id's. Given that there can be anywhere from 2-100+ duplicate product id's, things like array_diff array_unique etc are not working.

I have tried the following:

array_diff($arr_a, $arr_b);

Doesn't work. It removes all matching values regardless of how many times they occur.

I have tried:

foreach ($arr_a as $a) {
    $key = array_search($a, $arr_b);
    if ($key) unset($arr_b[$key]);
}
return $arr_b;

When I attempt to use this on very large arrays, the results are never perfect. I've tried a series of checks to weed out false positives and negatives but even those don't always work.


While a similar question to this was asked and received an answer seven years ago, the answer there is what I came up with on my own with the exception of an explicit check for unset(FALSE). Even with that check, it still does not work perfectly on the very large arrays I am passing. In short: that answer from seven years ago is wrong. The one I received here appears to be correct as it works.

22289d
  • 81
  • 1
  • 7
  • 2
    At least you need to show what you **have attempted** (show your code) – Ken Lee Aug 30 '23 at 15:37
  • I just added code. As for your question, it depends which array I'm focused on. If I want to extract unique occurrences in $arr_a the result would be nothing, that would be ignored. If I'm trying to find unique occurrences in $arr_b then that would be included. That's the 'fred' case in my example. – 22289d Aug 30 '23 at 15:53
  • That's what this is, yes: unset($arr_b[$key]); but the results are never perfect. Perhaps there is a better way to delete the items. – 22289d Aug 30 '23 at 16:01
  • Quasi-related: [Get differences between two flat arrays and merge with duplicated values in one array](https://stackoverflow.com/q/74089108/2943403) It is important that you read the red warning box in the documentation for `array_serarch()`. Use https://3v4l.org/WHT9r – mickmackusa Aug 31 '23 at 21:36
  • Also topically related: [Array Intersection - only once](https://stackoverflow.com/q/25074375/2943403) – mickmackusa Aug 31 '23 at 21:55
  • @22289d upon further consideration, your question's sample data and explanation does not clarify if 1. `$a` always has enough of each item to cover `$b` (might `$b` have more of an item than `$a`?) or 2. what the result should be if `$b` contains `george` but `$a` doesn't. Could you please edit your question to clarify the needed behaviour? – mickmackusa Sep 01 '23 at 23:58
  • More specifically, what is your desired result from this sample data https://3v4l.org/Et2sj ? Or, are any aspects of this sample data "not possible" in the scope of your application? – mickmackusa Sep 02 '23 at 00:14

2 Answers2

1

If you count the number of occurrences in each array (array_count_values()) it will end up with a list with the name and amount of times they occur.

Then if you loop through the lists and compare the count, and output a result array using array_pad() to repeat the key however many times are left...

$counta = array_count_values($arr_a);
$countb = array_count_values($arr_b);

$result = [];
foreach ($counta as $key => $count) {
    $result = array_merge($result, array_pad([], abs(($countb[$key] ?? 0) - $count, $key)));
}

print_r($result);

This means you process each list once and output a result through a quick loop.

Breaking down the one part (with a slight correction of brackets)...

array_merge($result, array_pad([], abs(($countb[$key] ?? 0) - $count), $key));

The $countb[$key] ?? 0 part just checks for a match in the second count array (`?? 0' gives it 0 if not found) and the next part takes the count from the first array from it.

The array_pad([], $count, $key) part then creates a new array with the number of items from the above and the value is the name (the key from the array_count_values result).

Finally this new list of names is added the the list it is creating.

You could break it down to

foreach ($counta as $key => $count) {
    $countToAdd = abs(($countb[$key] ?? 0) - $count);
    $listToAdd = array_pad([], $countToAdd, $key);
    $result = array_merge($result, $listToAdd);
}

which is more readable.

Nigel Ren
  • 56,122
  • 11
  • 43
  • 55
  • Thanks, gonna try this now. I suspected some array functions I've never heard of would do a much better job than the ones I know and tried. Hopefully this is exactly that. – 22289d Aug 30 '23 at 17:12
  • wow, this appears to work perfectly. it just cleaned up over 20,000 records instantly. it all snapped together. thank you! could you explain what's happening on this line? i've never used any of this and i'd like to actually understand what it's doing. `array_merge($result, array_pad([], abs(($countb[$key] ?? 0)) - $count, $key));` – 22289d Aug 30 '23 at 17:30
  • 1
    @22289d I'll add some further explanation to the question... – Nigel Ren Aug 30 '23 at 17:35
  • @Nigel Please close new questions with older duplicates instead of answering. If you have something unique to add on the topic (and you do in this case), please post your advice on the earlier asked question so that all insights are in one place. This improves the researcher experience and allows answer sorting by score to occur naturally/helpfully. Having answers on multiple pages forces researchers to chase insights and they may be doubtful that they've found all of the relevant pages. Please curate more. – mickmackusa Aug 31 '23 at 21:59
0

There are many ways to do the job. One of them is to unlink (i.e. delete) both the source ($arr_a) and the target ($arr_b) in case there is a match (array_search)

So the code is

<?php

$arr_a = ['mary', 'mary', 'mary', 'jack', 'jack', 'jack', 'jack', 'fred', 'fred'];
$arr_b = ['mary', 'mary', 'jack', 'jack'];

$index=0;

foreach ($arr_a as $a) {
  $key = array_search($a, $arr_b);
    if ($key !==false) {
        unset($arr_b[$key]);
        unset($arr_a[$index]);
    }
  $index++;
}
var_dump($arr_a);
?>

Please note that you cannot only use if ($key) { } to do the comparison because if the matched position is the 1st position (which means position 0), the $key will be zero, which means false.

You may see the result thru this sandbox

Result is:

array(5) {
  [2]=>
  string(4) "mary"
  [5]=>
  string(4) "jack"
  [6]=>
  string(4) "jack"
  [7]=>
  string(4) "fred"
  [8]=>
  string(4) "fred"
}

Ken Lee
  • 6,985
  • 3
  • 10
  • 29
  • Thank you. It wasn't the whole problem but this was causing problems. The cases were I had just 1 left over all got cleaned up from this: if ($key !="") – 22289d Aug 30 '23 at 17:09
  • When using [`array_search()`](https://www.php.net/manual/en/function.array-search.php) on an indexed array, the returned value will either be an integer or `false`. Therefore, it makes no sense to make a loose check against an empty string. If `$arr_a` is an indexed array and you are iterating with a `foreach()`, then simply use the array's keys in the loop instead of declaring and maintaining `$index`. – mickmackusa Aug 31 '23 at 21:23
  • Better advice is shown in [this earlier answer](https://stackoverflow.com/a/65273096/2943403). – mickmackusa Aug 31 '23 at 22:01