5

I found a few solutions but I can't decide which one to use. What is the most compact and effective solution to use php's array_unique() function on a case-insensitive array?

Example:

$input = array('green', 'Green', 'blue', 'yellow', 'blue');
$result = array_unique($input);
print_r($result);

Result:

Array ( [0] => green [1] => Green [2] => blue [3] => yellow )

How do we remove the duplicate green? As far as which one to remove, we assume that duplicates with uppercase characters are correct.

e.g. keep PHP remove php

or keep PHP remove Php as PHP has more uppercase characters.

So the result will be

Array ( [0] => Green [1] => blue [2] => yellow )

Notice that the Green with uppercase has been preserved.

CyberJunkie
  • 21,596
  • 59
  • 148
  • 215

5 Answers5

14

Would this work?

$r = array_intersect_key($input, array_unique(array_map('strtolower', $input)));

Doesn't care about the specific case to keep but does the job, you can also try to call asort($input); before the intersect to keep the capitalized values instead (demo at IDEOne.com).

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • 1
    It does indeed work, very clean solution. I would be tempted to `trim()` the values as well, but that's up to OP's definition of duplicate. – Wesley Murch Jun 05 '11 at 03:10
  • It doesn't keep strings with most uppercase characters. – piotrm Jun 05 '11 at 03:45
  • @piotrm: I mentioned that in my answer... – Alix Axel Jun 05 '11 at 04:04
  • That was a reply to Wesley Murch's comment, you can't say it does the job, it may be a very clean solution, but to some other problem. – piotrm Jun 05 '11 at 04:14
  • @piotrm: The OP never asked for a solution that keeps the strings with the *most* uppercase letters, that's just your assumption. – Alix Axel Jun 05 '11 at 07:04
  • I actually think @Alix's answer is better than mine. It preserves the original input array's index positions (although mine would too if you replace the `sort()` with `asort()`), and is much clearer to read. Functionally, I think both are equivalent, though, once you make the sort->asort change. – dossy Jun 05 '11 at 23:09
  • Thanks Alix, your code is indeed the most compact and does the job. – CyberJunkie Jun 06 '11 at 01:22
3

If you can use PHP 5.3.0, here's a function that does what you're looking for:

<?php
function array_unique_case($array) {
    sort($array);
    $tmp = array();
    $callback = function ($a) use (&$tmp) {
        if (in_array(strtolower($a), $tmp))
            return false;
        $tmp[] = strtolower($a);
        return true;
    };
    return array_filter($array, $callback);
}

$input = array(
    'green', 'Green', 
    'php', 'Php', 'PHP', 
    'blue', 'yellow', 'blue'
);
print_r(array_unique_case($input));
?>

Output:

Array
(
    [0] => Green
    [1] => PHP
    [3] => blue
    [7] => yellow
)
dossy
  • 1,617
  • 16
  • 26
  • 1
    Nice, but will fail on `'Green','gREEN'` - will return `'Green'`, but `'gREEN'` has more uppercase characters. – piotrm Jun 05 '11 at 03:37
  • @piotrm, yes it fails to return words with the most uppercase characters if the first character isn't uppercase. Incidentally I think that's good. Acronyms are all uppercase characters, so that's why I want to return the uppercase duplicate if it exists. otherwise if the first characater is lowercase and other uppercase its usually considered incorrect. – CyberJunkie Jun 05 '11 at 16:42
  • 2
    @CyberJunkie - yeah, I kind of assumed the rule was not to necessarily choose which word had the _most_ uppercase characters, but weight those that start with uppercase more heavily than those which don't. – dossy Jun 05 '11 at 22:55
1
function count_uc($str) {
  preg_match_all('/[A-Z]/', $str, $matches);
  return count($matches[0]);
}

$input = array(
    'green', 'Green', 'yelLOW', 
    'php', 'Php', 'PHP', 'gREEN', 
    'blue', 'yellow', 'bLue', 'GREen'
);

$input=array_unique($input);
$keys=array_flip($input);
array_multisort(array_map("strtolower",$input),array_map("count_uc",$input),$keys);
$keys=array_flip(array_change_key_case($keys));
$output=array_intersect_key($input,$keys);
print_r( $output );

will return:

Array
(
    [2] => yelLOW
    [5] => PHP
    [6] => gREEN
    [9] => bLue
)
piotrm
  • 12,038
  • 4
  • 31
  • 28
0

You should first make all values lowercase, then launch array_unique and you are done

dynamic
  • 46,985
  • 55
  • 154
  • 231
0

Normalize your data first by sending it through strtoupper() or strtolower() to make the case consistent. Then use your array_unique().

$normalized = array_map($input, 'strtolower');
$result = array_unique($normalized);
$result = array_map($result, 'ucwords');
print_r($result);
SamT
  • 10,374
  • 2
  • 31
  • 39
  • 1
    Wouldn't that make the result all lowercase or uppercase? I want to preserve the original value with an uppercase character – CyberJunkie Jun 05 '11 at 01:59
  • It would, but I added an example that takes care of that. calling `ucwords()` would uppercase the first letter. – SamT Jun 05 '11 at 02:00