5

I have a string like so "Ba\u015f\u00e7\u0131l". I'm assuming those are some special accent characters. How do I:

1) Display the string with the accents (i.e replace code with actual character)

2) What is best practice for storing strings like this?

2) If I don't want to allow such characters, how do I replace it with "normal characters"?

hakre
  • 193,403
  • 52
  • 435
  • 836
Andy Hin
  • 30,345
  • 42
  • 99
  • 142

5 Answers5

3

My educated guess is that you obtained such values from a JSON string. If that's the case, you should properly decode the full piece of data with json_decode():

<?php

header('Content-Type: text/plain; charset=utf-8');

$data = '"Ba\u015f\u00e7\u0131l"';
var_dump( json_decode($data) );

?>
Álvaro González
  • 142,137
  • 41
  • 261
  • 360
1
  1. To display the characters look at How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?

  2. You can store the character like that, or decoded, just make sure your storage can handle the UTF8 charset.

  3. Use iconv with the translit flag.

Here's an example...

function replace_unicode_escape_sequence($match) {
    return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}
$str = preg_replace_callback('/\\\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', $str);

echo $str;

echo '<br/>';
$str = iconv('UTF8', 'ASCII//TRANSLIT', $str);

echo $str;
Community
  • 1
  • 1
Jacob
  • 8,278
  • 1
  • 23
  • 29
0

Here's another option:

<html><head>
    <!-- don't forget to tell the browser what encoding you're using: -->
    <meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
</head><body><?php

$string = "Ba\u015f\u00e7\u0131l";
echo json_decode('"'.str_replace('"', '\"', $string).'"');

?></body></html>

This works because the \u000 syntax is what JSON uses. Note that json_decode() requires the JSON module, which is now a part of the standard PHP installation.

Mark Eirich
  • 10,016
  • 2
  • 25
  • 27
0

There is no native support in PHP to decode such strings.

There are several tricks to use native function though I am not sure that any of those is safe and injection proof :

Another option using Zend Framework is to download the Zend_Utf8 proposal class. See more information at Zend_Utf8 proposal for Zend Framework

Frederic Bazin
  • 1,530
  • 12
  • 27
-1
  1. Outputing them would output the appropriate character. If you don't provide any encoding for the output document, the browser would try and guess the best one to show. Otherwise you should figure it out and output explicitly.
  2. Simply store them, or turn them into normal chars and binary store them.
  3. Use iconv functions to convert from one encoding to another, then you shuold save your source file with the desired encoding to support it.
AbiusX
  • 2,379
  • 20
  • 26