Split an array by first letter containing html entity

Question

I've got an array with countries like this

array(249) {
  [0]=>
  array(4) {
    ["country_id"]=>
    string(1) "2"
    ["country_name_en"]=>
    string(19) "&Aring;land Islands"
    ["country_alpha2"]=>
    string(2) "AX"
    ["country_alpha3"]=>
    string(3) "ALA"
  }
  etc.
}

I would like to split it by the first letter so i get an array like this

array(26) {
 'A' => array(10) {
    array(4) {
      ["country_id"]=>
      string(1) "2"
      ["country_name_en"]=>
      string(19) "&Aring;land Islands"
      ["country_alpha2"]=>
      string(2) "AX"
      ["country_alpha3"]=>
      string(3) "ALA"
    }
    etc.
  }
  etc.
}

But the problem is that the array of country names contains html entities as first character.

Any ideas how to do this?

thanks in advance

Peter

possible duplicate of [How to convert & characters to HTML characters?](http://stackoverflow.com/questions/6665985/how-to-convert-characters-to-html-characters) — Gordon, Jun 23 '12 at 14:26
As this question requires the transliteration of `Å` to `A`, I don't think @Gordon's suggestion of a duplicate is fitting. — rodneyrehm, Jun 23 '12 at 15:11
possible duplicate of [How to transliterate Accented Characters into plain ascii chars](http://stackoverflow.com/questions/3542717/how-to-transliterate-accented-characters-into-plain-ascii-characters) — Gordon, Jun 23 '12 at 15:17

rodneyrehm · Answer 1 · 2012-06-23T15:13:55.130

If you want Åland Islands to be filed under A, you'll need to do a bit more than the already suggested html_entity_decode().

intl contains the Normalizer::normalize(), a function to convert Å to Å. Confused yet? That unicode symbol (U+00C5) can be represented as 0xC385 (Composition) and 0x41CC8A (Decomposition) in UTF-8. 0x41 is A, 0xCC8A is ̊.

So, to get your islands filed properly, you'd want to do something like this:

$string = "&Aring;land Islands";
$s = html_entity_decode($string, ENT_QUOTES, 'UTF-8');
$s = Normalizer::normalize($s, Normalizer::FORM_KD);
$s = mb_substr($s, 0, 1);

Chances are, your environment doesn't have intl installed. If that's the case, you might look into urlify(), a function that will reduce strings to their alphanumeric parts.

with the above you should be able to

loop the original array
extract the country name
sanitize the country name and extract the first character
build a new array based on the character of (3)

Note: Beware that the countries Armenia, Austria and Australia would all file under A.

Jeroen · Answer 2 · 2012-06-23T14:36:45.910

Loop through the array, use html_entity_decode() to decode the html entities, and then split using mb_substr().

foreach($array as $values) {
    $values['country_name_en'] = html_entity_decode($values['country_name_en']);
    $index = mb_substr($values['country_name_en'], 0, 1);

    $new_array[$index] = $values;
}

Or you can use the function jlcd suggested:

function substr_unicode($str, $s, $l = null) {
    return join("", array_slice(
        preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));
}

foreach($array as $values) {
    $values['country_name_en'] = html_entity_decode($values['country_name_en']);
    $index = substr_unicode($values['country_name_en'], 0, 1);

    $new_array[$index] = $values;
}

Be aware that, when using `mb_substr` or `substr`, it may not return the proper result depending on the string's encoding: http://www.php.net/manual/en/function.mb-substr.php#107698 — dmmd, Jun 23 '12 at 14:27

Split an array by first letter containing html entity

2 Answers2