PHP UTF-8 accented character (extended ASCII ?) identification issue

Question

I've got a string containing accented characters in a php script. My script file is encoded in UTF-8, without BOM. But I can't manage to isolate the single accented characters without breaking them :

sample :

<!doctype html>
<html>
   <head>
      <meta charset='UTF-8'>
   </head>
   <body>
<?php

$myWord='Méditerranée'; // 12 characters long
echo strlen($myWord).'<br/>';   // shows 14
echo mb_strlen($myWord).'<br/>';// shows 12
$myWord=str_split($myWord);
echo count($myWord).'<br/>'; // shows 14
foreach($myWord as $rank=>$character) {
   echo $character;
} // shows 'Méditerranée'
foreach($myWord as $rank=>$character) {
   echo $character.' ';
} // shows 'M * * d i t e r r a n * * e '
  /* each * is a black diamond with a question mark inside */

The `foreach` loop works byte wise, not character wise, since it is not meant to be applied on strings. That means it handles utf multibyte sequences as multiple characters which leads the the parts of the multibyte character being separated by spaces. That means you device a single utf character into multiple spance devided characters. Some of those may well be "unpritable". — arkascha, Mar 18 '16 at 08:59
Basically, you can't `foreach` over a UTF-8 string, because PHP doesn't know the string uses multiple bytes for each character. See linked article. — Alastair McCormack, Mar 18 '16 at 08:59
[What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/) — deceze, Mar 18 '16 at 09:06

PHP UTF-8 accented character (extended ASCII ?) identification issue

0 Answers0