1

Perhaps this has been asked several times but I can't find the right answer so here goes.

I have two arrays: one with ~135732 and the other one with ~135730 elements. I need to find which items are on the first but not on the second and viceverse and don't know is there is an easy way to achieve that.

This is what I would do it:

$countArr1 = count($arr1);
$countArr2 = count($arr2);

for($i=0; $i < $countArr1; $i++) {
    // Check whether current element on $arr1 is on $arr2 or not
    if (!in_array($arr1[$i], $arr2)) {
        // if it doesn't then add it to $newArr 
        $newArr[] = $arr1[$i];
    }    
}

Then I would do the same but inverse for $arr2. In huge arrays could take a while and also could kill memory or server resources, even if it's executed from CLI so which is the best and the most efficient, regarding use of resources, way to achieve this?

EDIT

Let's clear this a bit. I get $arr1 from DB and $arr2 comes from other place. So the big idea is to find which items needs to be updated and which ones needs to be added also which ones needs to be marked as obsolete. In less and common words:

  • if element is on $arr1 but doesn't exists on $arr2 should be marked as obsolete
  • if element comes in $arr2 btu doesn't exists on $arr1 then needs to be added (created)
  • otherwise that element just need to be updated

Clear enough? Feel free to ask everything in order to help on this

EDIT 2

Based on @dakkaron answer I made this code:

// $arr1 and $arr2 are previously built

$sortArr1 = asort($arr1);
$sortArr2 = asort($arr2);

$countArr1 = count($sortArr1);
$countArr2 = count($sortArr2);

$i = $j = 0;

$updArr = $inactiveArr = $newArr = [];

echo "original arr1 count: ", count($arr1), "\n";
echo "original arr2 count: ", count($arr2), "\n";

echo "arr1 count: ", $countArr1, "\n";
echo "arr2 count: ", $countArr2, "\n";

while ( $i < $countArr1 && $j < $countArr2) {
    if ($sortArr1[$i] == $sortArr2[$j]) {
        //Handle equal values
        $updArr[] = $sortArr1[$i];
        $i++; $j++;
    } else if ($sortArr1[$i] < $sortArr2[$j]) {
        //Handle values that are in arr1 but not in arr2
        $inactiveArr[] = $sortArr1[$i];
        $i++;
    } else {
        //Handle values that are in arr2 but not in arr1
        $newArr[] = $sortArr2[$j];
        $j++;
    }
}

echo "items update: ", count($updArr), "\n", "items inactive: ", count($inactiveArr), "\n", "items new: ", count($newArr), "\n";

And I got this output:

original arr1 count: 135732
original arr2 count: 135730
arr1 count: 1
arr2 count: 1
items update: 1
items inactive: 0
items new: 0

Why sort count returns 1?

ReynierPM
  • 17,594
  • 53
  • 193
  • 363
  • You can write it in 2 temp files. Sort by php or system utility, then compare it, reading line by line. If memory allows, do this in memory. Because compare 2 sorted arrays is not problem :) – splash58 Jun 24 '15 at 15:47
  • http://php.net/manual/en/function.array-diff.php and http://php.net/manual/en/function.array-intersect.php – AbraCadaver Jun 24 '15 at 15:53
  • What kind of values are stored in the array? – Gumbo Jun 24 '15 at 16:06
  • Closely related that I am tempted to call a dupe (still minor differences though): [Find delta of two lists](http://stackoverflow.com/q/30653705/572670) – amit Jun 24 '15 at 16:28
  • @amit what about if I don't have the unique id? – ReynierPM Jun 24 '15 at 16:31
  • @ReynierPM Then do the same while using a multiset/multimap rather than a set/map, basically. – amit Jun 24 '15 at 16:35
  • @amit sorry but I am not following you, can you post an answer with pseudo-code? – ReynierPM Jun 24 '15 at 16:39
  • So, judging from the other thread, most efficient is either O(nlogm) time and O(1) space (sorting and checking each element) or O(n+m) time and O(m) space, where `m` is the shorter list and `n` is the longer one. (All other solutions are O(nlogn) and O(n) time+space) – amit Jun 24 '15 at 16:41
  • @ReynierPM asort return boolean. It sort array given as param – splash58 Jun 24 '15 at 16:50
  • `asort` returns boolean... http://php.net/manual/pt_BR/function.asort.php – Niloct Jun 24 '15 at 16:53
  • in your code replace "$sortArr1 = asort($arr1);" with "asort($arr1);" and all occurrences of "$sortArr1" with "$arr1" (and do the same for $arr2) – Dakkaron Jun 25 '15 at 12:41

5 Answers5

3

You could take avantage of array_diff: http://php.net/manual/en/function.array-diff.php

Edit

A php function construct is more likely to perform better than an equivalent user-defined one. Searching I found this, but the size of your array is way smaller, and in the end I believe you should benchmark a prototype script with candidate solutions.

See my last comment.

Community
  • 1
  • 1
Niloct
  • 9,491
  • 3
  • 44
  • 57
  • And http://php.net/manual/en/function.array-intersect.php depending on what they want. – AbraCadaver Jun 24 '15 at 15:54
  • I wonder how efficient that array_diff is? (in terms of speed) – cmo Jun 24 '15 at 15:55
  • 1
    `array_diff` may be the simplest but probably not the most efficient solution: it makes a copy of both arrays, sorts them, and enumerates the values in parallel. So we have a doubling of memory and a runtime of probably O(n*log n). – Gumbo Jun 24 '15 at 16:05
  • https://gist.github.com/oanhnn/1a893702652ed8b91ba7 the OP can add their current algorithm and settle this. – Niloct Jun 24 '15 at 17:41
2

The best solution i can think of is to sort the second array, and try to look for values from the first array using binary search, this would take O(nLog(n)) complexity

yahya el fakir
  • 562
  • 2
  • 12
2

The best solution I can think of would be to first sort both arrays and then compare them from the bottom up.

  • Start with the lowest element in both arrays and compare them.
  • If they are equal, take them and move up one element on both arrays.
  • If they are different, move up one element on the array with the lower value.
  • If you reached the end of one of the arrays you are done.

After the sorting this should take about O(n) complexity.

This is a bit of code in pseudocode:

arr1 = ...
arr2 = ...

arr1.sort();
arr2.sort();

i1 = 0;
i2 = 0;
while (i1<arr1.length() && i2<arr2.length()) {
    if (arr1[i1]==arr2[i2]) {
        //Handle equal values
        i1++; i2++;
    } else if (arr1[i1]<arr2[i2]) {
        //Handle values that are in arr1 but not in arr2
        i1++;
    } else {
        //Handle values that are in arr2 but not in arr1
        i2++;
    }
}

Other than that, if you don't want to implement it yourself, just use array_diff

Dakkaron
  • 5,930
  • 2
  • 36
  • 51
1

Fill hashtable-based dictionary/map (don't know how it is called in PHP) with the second array elements, and check whether every element of the first array presents in this dictionary.
Usual complexity O(N)

for A in arr2
   map.insert(A)
for B in arr1
   if not map.contains(B) then
         element B  is on $arr1 but doesn't exists on $arr2

note that this approach doesn't address all problems in your edited question

MBo
  • 77,366
  • 5
  • 53
  • 86
  • Can you write a small piece of pseudocode? I will do my part on PHP side but need to see what are you talking about first – ReynierPM Jun 24 '15 at 16:40
1

Since your values are strings, you could take the advantage of PHP’s implementation of arrays using a hash-table internally with O(1) for key lookups:

$diff = [];

// A \ B
$lookup = array_flip($b); // O(n)
foreach ($a as $value) { // O(n)
    if (!isset($lookup[$value])) $diff[] = $value;
}

// B \ A
$lookup = array_flip($a); // O(n)
foreach ($b as $value) { // O(n)
    if (!isset($lookup[$value])) $diff[] = $value;
}

So in total, it’s O(n) in both space and time.

Of course, in the end you should benchmark it to see if it’s actually more efficient than other solutions here.

Gumbo
  • 643,351
  • 109
  • 780
  • 844