Remove duplicate rows in a two dimensional array

Question

I have some idea from the previous posts that were talking about making a hash value for each $arr[$i], and then compare the hash to get the unique array, but I don't know what I can do exactly.

My sample array Data:

$arr = [
    [0, 1, 2, 3],
    [4, 5, 2, 1],
    [0, 0, 0, 0],
    [0, 1, 2, 3]
];

I expected to return:

[
    [0, 1, 2, 3],
    [4, 5, 2, 1],
    [0, 0, 0, 0]
]

score 17 · Accepted Answer · answered Aug 08 '09 at 06:54

17

Quick and simple:

$arr = array_map('unserialize', array_unique(array_map('serialize', $arr)));

answered Aug 08 '09 at 06:54

Alix Axel

151,645
95
393
500

This approach is imperfect is sense that it will not catch duplicates if the order of the entries is different, `serialize(['a' => ['b' => 1], 'c' => ['d' => 2]]),serialize(['c' => ['d' => 2], 'a' => ['b' => 1]])` – Gajus Oct 29 '13 at 12:51
@GajusKuizinas: Those aren't the same by the PHP array definition anyway: http://codepad.org/nsTS5bDc. If however, that's what you want, just use the appropriate sort function beforehand. – Alix Axel Oct 29 '13 at 18:16
By the way, why are you using (un)serialize? Unless I am mistaken, `json_encode`/`json_decode` should be significantly faster. – Gajus Feb 05 '14 at 01:01
@GajusKuizinas: When I wrote this, it was way slower (I think it's faster now). Anyway, there are other reasons as JSON doesn't understand all PHP types. – Alix Axel Feb 05 '14 at 18:08
This is genius if the speed of serializing and unserializing is less than the speed of walking through the array, given that speed matters. – Ajayi Oluwaseun Emmanuel Apr 27 '17 at 10:44

score 1 · Answer 2 · answered Sep 08 '22 at 00:11

PHP already offers a native way to directly remove duplicate rows in an array.

Pass in the SORT_REGULAR flag to the array_unique() call to tell PHP to not change the data type while evaluating values.

Code: (Demo)

var_export(array_unique($arr, SORT_REGULAR));

Tyler Carter · Answer 3 · 2009-08-08T04:39:12.570

foreach($arr as $key => $value)
{
   foreach($arr as $key2 => $value2)
   {
      if($value2 == $value && $key != $key2)
       {
          unset($arr[$key]);
       }
    }
}

It isn't the most elegant method, but it does exactly what you need it to do. The problem is you can't use array_unique recursively.

This is another way from the PHP.net documentation comments (Great Code Snippets in there)

function arrayUnique($myArray) 
{ 
    if(!is_array($myArray)) 
           return $myArray; 

    foreach ($myArray as &$myvalue){ 
        $myvalue=serialize($myvalue); 
    } 

    $myArray=array_unique($myArray); 

    foreach ($myArray as &$myvalue){ 
        $myvalue=unserialize($myvalue); 
    } 

    return $myArray; 

}

score 0 · Answer 4 · edited Jun 17 '14 at 11:04

Here's another idea. Again, not terribly elegant, but might be pretty fast. It's similar to Chacha102's second part, although it would be faster if you only have integer values in the sub arrays.

// implode the sub arrays
$tmpArray = array();
foreach ($arr as $key => $array) {
    $tmpArray[$key] = implode(',', $array);
}

// get only the unique values
$tmpArray = array_unique($tmpArray);

// explode the values
$arr = array();
foreach ($tmpArray as $key => $string) {
    $arr[$key] = explode(',', $string);
}

Charles Ma · Answer 5 · 2009-08-08T05:06:58.973

Hashing is a good idea, it would make the solution O(n) on average

Basically you iterate through $arr and make a hash of the entire array and then you compare it against the previous hashes you've seen (this is O(1) using isset(), or O(m) to be precise where m is the number of elements in the inner array). and if there is a collision, you compare the actual array elements. usually a collision means that you've seen that array before and it's a duplicate, but that's not guaranteed. here's some psuedo php that implements this algorithm.

function mkhash($array = array()) {
   $hash = "";
   foreach ($array as $element) {
      $hash .= md5($element);
   }
}

$seen = array();
$newArray = array();
foreach($arr as $elementArray) {
   $hash = mkhash($elementArray); 
   if(!isset($seen[$hash])) {
     $newArray[] = $elementArray;
     $seen[$hash] = $elementArray;
   } else if(count(array_diff($elementArray, $seen[$hash])) > 0) {
      $newArray[] = $elementArray; //this is true if two different arrays hashed to the same element
   }
}

The hashing method is harder to implement, and dealing with collisions properly is tricky, so there's the O(nlogn).

The O(nlogn) way of doing this would be to sort the array

$arr = array_multisort($arr); //O(nlogn)

And then all you would have to do is compare adjacent arrays to see if they are duplicates

Of course you can simply use the O(n^2) approach and compare each inner array with every other inner array...

EDIT: oh and here's another O(n) idea, you can recursively build a trie using array keys that map to other arrays, so you end up with a m-level deep array where m is the longest inner array you have. Each branch of the trie represents a unique inner array. Of course you would have to write some overhead code to convert the trie back into a 2D array, so you won't see the performance benefit until the cardinality of your input is very large!

score -1 · Answer 6 · answered Aug 08 '09 at 06:06

It depends on if you have the resources to keep the larger array in memory (so basically, it depends on if you want only unique values to prevent it from bloating during the loop, or if you just need the final outcome to be an array of unique values.

For all examples, I assume you are getting the values to enter into the big array from some external source, like a MySQL query.

To prevent duplicates from being entered into the master array:

You could create two arrays, one with the values as a string, one with the values as actual array values.

while($row = $results->fetch_assoc) {
     $value_string = implode("," $row);
     if(in_array($value_string, $check_array) {
         $check_array[] = $value_string;
         $master_array[] = $row;
      }
 }

In the above, it just sees if the string version of your data set is in the array of string data sets already iterated through. You end up with a bigger overhead with two arrays, but neither ever gets duplicate values.

Or, as already mentioned, I'm sure, there is array_unique, which happens after all data is entered. Modifying the above example, you get

while($row = $results->fetch_assoc) {
     $master_array[] = $row;
   }
 $master_array = array_unique($master_array);

Remove duplicate rows in a two dimensional array

6 Answers6

Linked

Related