5

I'm having a hard time trying to replace this weird right single quote character. I'm using str_replace like this:

str_replace("’", '\u1234', $string);

It looks like I cannot figure out what character the quote really is. Even when I copy paste it directly from PHPMyAdmin it still doesn't work. Do I have to escape it somehow?

The character: http://www.lukomon.com/Afbeelding%204.png

  • MySQL Charset: UTF-8 Unicode (utf8)
  • MySQL Collations: utf8_unicode_ci
  • <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

EDIT: It turned out to be a Microsoft left single quote which I could replace with this function from Phill Paffords comment. Not sure which answer I should mark now..

Community
  • 1
  • 1
richard
  • 14,050
  • 8
  • 37
  • 39
  • 1
    Why do you want to escape it? How does it interfere with anything? – zneak Apr 29 '10 at 22:13
  • 4
    Chances are, if `’` isn't behaving how you want, you'll be breaking *all* non-ASCII characters. Time to check one of the 2,000,000 SO questions about “why doesn't Unicode make it through my PHP?”. (Usually because of lack of UTF-8, `mysql_set_charset` or using `htmlentities` instead of proper `htmlspecialchars`.) – bobince Apr 29 '10 at 22:17
  • Made a mistake there. I need to replace it, not escape. I am using htmlspecialchars but tried to replace the character before htmlspecialchars and afterwards. No effect. mysql_set_charset is an undefined function, the database is in utf8 though. – richard Apr 29 '10 at 23:57
  • Just to check, you're worried about _'_ and not _`_ right? (apostrophe - by enter key, vs backtick - above tab key) – jcolebrand May 03 '10 at 17:58
  • It's not a regular quote and it isn't a backtick either. – richard May 04 '10 at 09:18
  • 3
    http://stackoverflow.com/questions/1262038/how-to-replace-microsoft-encoded-quotes-in-php might help – Phill Pafford May 05 '10 at 15:49
  • `iconv()` should work here, shouldn't it? – Frank Farmer May 08 '10 at 14:04
  • @richard FWIW, my answer suggests finding the Unicode Character at code point 2019. In Windows-1252 that character is encoded as 92 (hex) or 146 (decimal) which is essentially the same as Phil's solution. Only replacing by Unicode Code Point is more flexible between encodings. – Peter Bailey May 08 '10 at 19:00
  • @richard - You should mark whichever answer is the highest voted (as that is what the system will do anyway, except they will just get fewer points) – Mitch Dempsey May 08 '10 at 23:48

10 Answers10

8

This had happend to me too. Couple of things:

  • Use htmlentities function for your text

    $my_text = htmlentities($string, ENT_QUOTES, 'UTF-8');

More info about the htmlentities function.

  • Use proper document type, this did the trick for me.

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

  • Use utf-8 encoding type in your page:

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Here is the final prototype for your page:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>    
<body>

<?php     
    // your code related to database        
    $my_text = htmlentities($string, ENT_QUOTES, 'UTF-8');    
?>

</body>
</html>

.

If you want to replace it however, try the mb_ereg_replace function.

Example:

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");

$my_text = mb_ereg_replace("’","'", $string);
Sarfraz
  • 377,238
  • 77
  • 533
  • 578
  • Thanks for your answer. I was indeed missing a doctype or any other HTML in fact. Once I added it and your htmlentities rule the `’` became a `�`. Once I switched Firefox to Western ISO charset it changed back to `’`. str_replace still doesn't work and unfortunately mb_ereg_replace does not work either. – richard May 02 '10 at 12:18
  • 1
    Try this: `str_replace("’", "'", $string);` and also try removing the `htmlentites` function and then see. – Sarfraz May 02 '10 at 12:41
  • With the html entities function the whole record just disappears. If I remove the function it shows the `�`. My database is UTF-8 but I can change it in PHPMyAdmin? – richard May 02 '10 at 15:33
  • @richard: yup, you can change the encoding in phpmyadmin, there is an option once you select a database. – Sarfraz May 02 '10 at 17:45
  • @Sarfraz: what should I change it to? I tried setting it to utf8_unicode_ci but that didn't change anything in the output. Thanks. – richard May 03 '10 at 09:57
  • @richard: it should be set to `utf8_unicode_ci` to allow foreign languages but not sure which charset that character belongs to. You might want to give a try to any of the `latin` charset too. – Sarfraz May 03 '10 at 10:39
4

I had the same issue and found this to work:

function replace_rsquote($haystack,$replacewith){
   $pos = strpos($haystack,chr("226"));
   if($pos > -1){
       return substr_replace($haystack,$replacewith,$pos,3);
   } else return $haystack;
}

Example:

echo replace_rsquote("Nick’s","'"); //Nick's
David Kinkead
  • 191
  • 1
  • 5
2

To find what character it is, run it through the ord function, which will give you the ASCII code of the character:

echo ord('’'); // 226

Now that you know what it is, you can do this:

str_replace('’', chr(226), $string);
Casey Chu
  • 25,069
  • 10
  • 40
  • 59
1

To replace it:

If your script file is encoded in the same encoding as the data you are trying to do the replacement in, it should work the way you posted it. If you're working with UTF-8 data, make sure the script is encoded in UTF-8 and it's not your editor silently transliterating the character when you paste it.

If it won't work, try escaping it as described below and see what code it returns.

To escape it:

If your source file is encoded in UTF-8, this should work:

$string = htmlentities($string, ENT_QUOTES, "UTF-8");

the default character set of html... is iso-8859-1. Anything differing from that must be explicitly stated.

For more complex character conversion issues, always check out the User Contributed Notes to functions like htmlentities(), there are often real gems to be found there.

In General:

Bobince is right in his comment, systemic character set problems should be sorted systematically so they don't bite you in the ass - if only by defining which character set is used on every step of the way:

  • How the script file is encoded;
  • How the document is served;
  • How the data is stored in the database;
  • How the database connection is encoded.
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • The file is saved UTF-8. HTML meta tag is UTF-8. Database UTF-8. Database connection... Call to undefined function. That's unfortunate, I believe that might be the problem. – richard May 02 '10 at 12:29
1

If you are using non-ASCII characters in your PHP code, you need to make sure that you’re using the same character encoding as in the data you are processing. Your attempt probably fails because you are using a different character encoding in your PHP script than in $string.

Additionally, if you’re using a multibyte character encoding such as UTF-8, you should also use the multibyte aware string functions.

Gumbo
  • 643,351
  • 109
  • 780
  • 844
1

Gumbo sad right -
- save your script as utf-8 file
- and use http://php.net/mbstring (as Sarfraz pointed in his last example)

arena-ru
  • 990
  • 2
  • 12
  • 25
0

Why not run the string through htmlspecialchars() and output it to see what it turns that character into, so you know what to use as your replace expression?

user97410
  • 714
  • 1
  • 6
  • 22
0

This character you have is the Right Single Quotation Mark.

To replace it with a pattern you'll want to do something like this

$string = preg_replace( "/\\x{2019}/u", 'replacement', $string );

But that really only addresses the symptom. The problem is that you don't have consistent use of character encodings throughout your application, as others have noted.

Peter Bailey
  • 105,256
  • 31
  • 182
  • 206
  • With your replace pattern, the row contain the Right Single Quotation Mark returns empty. I wouldn't know what to change now. MySQL charset is UTF-8, collocation utf8_unicode_ci and html meta tag utf-8. – richard May 04 '10 at 09:24
  • I'm not really sure what you mean by "returns empty". Can you be more explicit? – Peter Bailey May 04 '10 at 14:28
  • I am quering 12 rows, 1 row contains that comma. 11 rows are being returned by PHP with this replace expression. – richard May 05 '10 at 17:12
  • PHP doesn't return rows. A SQL server does. And even then, `preg_replace()` operates on a single string - a column perhaps - not a "row". You are still confusing me. Did you try moving this regular expression into a SQL query? – Peter Bailey May 05 '10 at 18:38
  • preg_replace is a heavy function for a static character – without need complex regex pattern match, there is no reason for the comparatively slow preg_replace. str_replace would be much faster. – kingjeffrey May 08 '10 at 06:05
  • @kingjeffrey - that's totally a micro optimization. Accurate? Sometimes (preg_replace can be faster in unexpected ways). Relevant? No, I don't think so. – Peter Bailey May 08 '10 at 15:59
  • @Peter Bailey "that's totally a micro optimization"... str_replace() preforms 50% faster than preg_replace() for simple string replacement. I threw this benchmark together: http://test.kingdesk.com/preg-replace-v-str-replace/. For complex applications (such as the wp-Typography plugin: http://wordpress.org/extend/plugins/wp-typography/), it can save seconds off of complex parsing operations. – kingjeffrey May 08 '10 at 23:55
0

Don't use any regex functions ( preg_replace or mb_ereg_replace ). They are way to heavy for this.

str_replace(chr(226),'\u2019' , $string);

If your needle is a multibyte character, you may have better luck with this bespoke function:

<?php 
function mb_str_replace($needle, $replacement, $haystack) {
    $needle_len = mb_strlen($needle);
    $replacement_len = mb_strlen($replacement);
    $pos = mb_strpos($haystack, $needle);
    while ($pos !== false)
    {
        $haystack = mb_substr($haystack, 0, $pos) . $replacement
                . mb_substr($haystack, $pos + $needle_len);
        $pos = mb_strpos($haystack, $needle, $pos + $replacement_len);
    }
    return $haystack; 
} 
?>

credit for this last function: http://www.php.net/manual/en/ref.mbstring.php#86120

kingjeffrey
  • 14,894
  • 6
  • 42
  • 47
0

You can get the char ascii code with ord then replace it with your desired character:

$asciicode = ord('’'); // 146
$stringfixed = str_replace(chr($asciicode), '\'', $string);
Emanuel A.
  • 112
  • 1
  • 8