1

I've got an array with countries like this

array(249) {
  [0]=>
  array(4) {
    ["country_id"]=>
    string(1) "2"
    ["country_name_en"]=>
    string(19) "Åland Islands"
    ["country_alpha2"]=>
    string(2) "AX"
    ["country_alpha3"]=>
    string(3) "ALA"
  }
  etc.
}

I would like to split it by the first letter so i get an array like this

array(26) {
 'A' => array(10) {
    array(4) {
      ["country_id"]=>
      string(1) "2"
      ["country_name_en"]=>
      string(19) "Åland Islands"
      ["country_alpha2"]=>
      string(2) "AX"
      ["country_alpha3"]=>
      string(3) "ALA"
    }
    etc.
  }
  etc.
}

But the problem is that the array of country names contains html entities as first character.

Any ideas how to do this?

thanks in advance

Peter

rodneyrehm
  • 13,442
  • 1
  • 40
  • 56
Sephen
  • 1,111
  • 3
  • 16
  • 38
  • possible duplicate of [How to convert & characters to HTML characters?](http://stackoverflow.com/questions/6665985/how-to-convert-characters-to-html-characters) – Gordon Jun 23 '12 at 14:26
  • As this question requires the transliteration of `Å` to `A`, I don't think @Gordon's suggestion of a duplicate is fitting. – rodneyrehm Jun 23 '12 at 15:11
  • 1
    possible duplicate of [How to transliterate Accented Characters into plain ascii chars](http://stackoverflow.com/questions/3542717/how-to-transliterate-accented-characters-into-plain-ascii-characters) – Gordon Jun 23 '12 at 15:17

2 Answers2

2

If you want Åland Islands to be filed under A, you'll need to do a bit more than the already suggested html_entity_decode().

intl contains the Normalizer::normalize(), a function to convert Å to Å. Confused yet? That unicode symbol (U+00C5) can be represented as 0xC385 (Composition) and 0x41CC8A (Decomposition) in UTF-8. 0x41 is A, 0xCC8A is ̊.

So, to get your islands filed properly, you'd want to do something like this:

$string = "Åland Islands";
$s = html_entity_decode($string, ENT_QUOTES, 'UTF-8');
$s = Normalizer::normalize($s, Normalizer::FORM_KD);
$s = mb_substr($s, 0, 1);

Chances are, your environment doesn't have intl installed. If that's the case, you might look into urlify(), a function that will reduce strings to their alphanumeric parts.


with the above you should be able to

  1. loop the original array
  2. extract the country name
  3. sanitize the country name and extract the first character
  4. build a new array based on the character of (3)

Note: Beware that the countries Armenia, Austria and Australia would all file under A.

rodneyrehm
  • 13,442
  • 1
  • 40
  • 56
1

Loop through the array, use html_entity_decode() to decode the html entities, and then split using mb_substr().

foreach($array as $values) {
    $values['country_name_en'] = html_entity_decode($values['country_name_en']);
    $index = mb_substr($values['country_name_en'], 0, 1);

    $new_array[$index] = $values;
}

Or you can use the function jlcd suggested:

function substr_unicode($str, $s, $l = null) {
    return join("", array_slice(
        preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));
}

foreach($array as $values) {
    $values['country_name_en'] = html_entity_decode($values['country_name_en']);
    $index = substr_unicode($values['country_name_en'], 0, 1);

    $new_array[$index] = $values;
}
Jeroen
  • 13,056
  • 4
  • 42
  • 63
  • Be aware that, when using `mb_substr` or `substr`, it may not return the proper result depending on the string's encoding: http://www.php.net/manual/en/function.mb-substr.php#107698 – dmmd Jun 23 '12 at 14:27