Replacing ’ character in PHP

Question

I'm having a hard time trying to replace this weird right single quote character. I'm using str_replace like this:

str_replace("’", '\u1234', $string);

It looks like I cannot figure out what character the quote really is. Even when I copy paste it directly from PHPMyAdmin it still doesn't work. Do I have to escape it somehow?

The character: http://www.lukomon.com/Afbeelding%204.png

MySQL Charset: UTF-8 Unicode (utf8)
MySQL Collations: utf8_unicode_ci
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

EDIT: It turned out to be a Microsoft left single quote which I could replace with this function from Phill Paffords comment. Not sure which answer I should mark now..

Why do you want to escape it? How does it interfere with anything? — zneak, Apr 29 '10 at 22:13
Chances are, if `’` isn't behaving how you want, you'll be breaking *all* non-ASCII characters. Time to check one of the 2,000,000 SO questions about “why doesn't Unicode make it through my PHP?”. (Usually because of lack of UTF-8, `mysql_set_charset` or using `htmlentities` instead of proper `htmlspecialchars`.) — bobince, Apr 29 '10 at 22:17
Made a mistake there. I need to replace it, not escape. I am using htmlspecialchars but tried to replace the character before htmlspecialchars and afterwards. No effect. mysql_set_charset is an undefined function, the database is in utf8 though. — richard, Apr 29 '10 at 23:57
Just to check, you're worried about _'_ and not _`_ right? (apostrophe - by enter key, vs backtick - above tab key) — jcolebrand, May 03 '10 at 17:58
http://stackoverflow.com/questions/1262038/how-to-replace-microsoft-encoded-quotes-in-php might help — Phill Pafford, May 05 '10 at 15:49
@richard FWIW, my answer suggests finding the Unicode Character at code point 2019. In Windows-1252 that character is encoded as 92 (hex) or 146 (decimal) which is essentially the same as Phil's solution. Only replacing by Unicode Code Point is more flexible between encodings. — Peter Bailey, May 08 '10 at 19:00
@richard - You should mark whichever answer is the highest voted (as that is what the system will do anyway, except they will just get fewer points) — Mitch Dempsey, May 08 '10 at 23:48

Sarfraz · Accepted Answer · 2010-05-02T11:30:16.773

8

This had happend to me too. Couple of things:

Use htmlentities function for your text

$my_text = htmlentities($string, ENT_QUOTES, 'UTF-8');

More info about the htmlentities function.

Use proper document type, this did the trick for me.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Use utf-8 encoding type in your page:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Here is the final prototype for your page:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>    
<body>

<?php     
    // your code related to database        
    $my_text = htmlentities($string, ENT_QUOTES, 'UTF-8');    
?>

</body>
</html>

.

If you want to replace it however, try the mb_ereg_replace function.

Example:

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");

$my_text = mb_ereg_replace("’","'", $string);

edited May 02 '10 at 11:30

answered May 02 '10 at 09:48

Sarfraz

377,238
77
533
578

Thanks for your answer. I was indeed missing a doctype or any other HTML in fact. Once I added it and your htmlentities rule the `’` became a `�`. Once I switched Firefox to Western ISO charset it changed back to `’`. str_replace still doesn't work and unfortunately mb_ereg_replace does not work either. – richard May 02 '10 at 12:18
1

Try this: `str_replace("’", "'", $string);` and also try removing the `htmlentites` function and then see. – Sarfraz May 02 '10 at 12:41
With the html entities function the whole record just disappears. If I remove the function it shows the `�`. My database is UTF-8 but I can change it in PHPMyAdmin? – richard May 02 '10 at 15:33
@richard: yup, you can change the encoding in phpmyadmin, there is an option once you select a database. – Sarfraz May 02 '10 at 17:45
@Sarfraz: what should I change it to? I tried setting it to utf8_unicode_ci but that didn't change anything in the output. Thanks. – richard May 03 '10 at 09:57
@richard: it should be set to `utf8_unicode_ci` to allow foreign languages but not sure which charset that character belongs to. You might want to give a try to any of the `latin` charset too. – Sarfraz May 03 '10 at 10:39

David Kinkead · Answer 2 · 2012-10-15T16:05:28.293

4

I had the same issue and found this to work:

function replace_rsquote($haystack,$replacewith){
   $pos = strpos($haystack,chr("226"));
   if($pos > -1){
       return substr_replace($haystack,$replacewith,$pos,3);
   } else return $haystack;
}

Example:

echo replace_rsquote("Nick’s","'"); //Nick's

edited Oct 15 '12 at 16:05

answered Oct 05 '12 at 17:38

David Kinkead

191
1
5

score 2 · Answer 3 · answered May 02 '10 at 10:15

2

To find what character it is, run it through the ord function, which will give you the ASCII code of the character:

echo ord('’'); // 226

Now that you know what it is, you can do this:

str_replace('’', chr(226), $string);

answered May 02 '10 at 10:15

Casey Chu

25,069
10
40
59

1

This just replaces the character with a copy of itself. – kingjeffrey May 08 '10 at 06:11
Good point, but the original poster's code does that too, so I figured I'd do the same. – Casey Chu May 08 '10 at 07:24

Pekka · Answer 4 · 2010-05-02T09:58:04.030

To replace it:

If your script file is encoded in the same encoding as the data you are trying to do the replacement in, it should work the way you posted it. If you're working with UTF-8 data, make sure the script is encoded in UTF-8 and it's not your editor silently transliterating the character when you paste it.

If it won't work, try escaping it as described below and see what code it returns.

To escape it:

If your source file is encoded in UTF-8, this should work:

$string = htmlentities($string, ENT_QUOTES, "UTF-8");

the default character set of html... is iso-8859-1. Anything differing from that must be explicitly stated.

For more complex character conversion issues, always check out the User Contributed Notes to functions like htmlentities(), there are often real gems to be found there.

In General:

Bobince is right in his comment, systemic character set problems should be sorted systematically so they don't bite you in the ass - if only by defining which character set is used on every step of the way:

How the script file is encoded;
How the document is served;
How the data is stored in the database;
How the database connection is encoded.

The file is saved UTF-8. HTML meta tag is UTF-8. Database UTF-8. Database connection... Call to undefined function. That's unfortunate, I believe that might be the problem. — richard, May 02 '10 at 12:29

score 1 · Answer 5 · answered May 02 '10 at 09:50

If you are using non-ASCII characters in your PHP code, you need to make sure that you’re using the same character encoding as in the data you are processing. Your attempt probably fails because you are using a different character encoding in your PHP script than in $string.

Additionally, if you’re using a multibyte character encoding such as UTF-8, you should also use the multibyte aware string functions.

score 1 · Answer 6 · answered May 04 '10 at 13:47

1

Gumbo sad right -
- save your script as utf-8 file
- and use http://php.net/mbstring (as Sarfraz pointed in his last example)

answered May 04 '10 at 13:47

arena-ru

990
2
12
25

score 0 · Answer 7 · answered Apr 29 '10 at 22:18

0

Why not run the string through htmlspecialchars() and output it to see what it turns that character into, so you know what to use as your replace expression?

answered Apr 29 '10 at 22:18

user97410

714
1
6
22

I tried that but nothing happens. That comma stays as it is :( – richard Apr 29 '10 at 23:58

score 0 · Answer 8 · answered May 03 '10 at 17:54

0

This character you have is the Right Single Quotation Mark.

To replace it with a pattern you'll want to do something like this

$string = preg_replace( "/\\x{2019}/u", 'replacement', $string );

But that really only addresses the symptom. The problem is that you don't have consistent use of character encodings throughout your application, as others have noted.

answered May 03 '10 at 17:54

Peter Bailey

105,256
31
182
206

With your replace pattern, the row contain the Right Single Quotation Mark returns empty. I wouldn't know what to change now. MySQL charset is UTF-8, collocation utf8_unicode_ci and html meta tag utf-8. – richard May 04 '10 at 09:24
I'm not really sure what you mean by "returns empty". Can you be more explicit? – Peter Bailey May 04 '10 at 14:28
I am quering 12 rows, 1 row contains that comma. 11 rows are being returned by PHP with this replace expression. – richard May 05 '10 at 17:12
PHP doesn't return rows. A SQL server does. And even then, `preg_replace()` operates on a single string - a column perhaps - not a "row". You are still confusing me. Did you try moving this regular expression into a SQL query? – Peter Bailey May 05 '10 at 18:38
preg_replace is a heavy function for a static character – without need complex regex pattern match, there is no reason for the comparatively slow preg_replace. str_replace would be much faster. – kingjeffrey May 08 '10 at 06:05
@kingjeffrey - that's totally a micro optimization. Accurate? Sometimes (preg_replace can be faster in unexpected ways). Relevant? No, I don't think so. – Peter Bailey May 08 '10 at 15:59
@Peter Bailey "that's totally a micro optimization"... str_replace() preforms 50% faster than preg_replace() for simple string replacement. I threw this benchmark together: http://test.kingdesk.com/preg-replace-v-str-replace/. For complex applications (such as the wp-Typography plugin: http://wordpress.org/extend/plugins/wp-typography/), it can save seconds off of complex parsing operations. – kingjeffrey May 08 '10 at 23:55

score 0 · Answer 9 · answered May 08 '10 at 06:21

Don't use any regex functions ( preg_replace or mb_ereg_replace ). They are way to heavy for this.

str_replace(chr(226),'\u2019' , $string);

If your needle is a multibyte character, you may have better luck with this bespoke function:

<?php 
function mb_str_replace($needle, $replacement, $haystack) {
    $needle_len = mb_strlen($needle);
    $replacement_len = mb_strlen($replacement);
    $pos = mb_strpos($haystack, $needle);
    while ($pos !== false)
    {
        $haystack = mb_substr($haystack, 0, $pos) . $replacement
                . mb_substr($haystack, $pos + $needle_len);
        $pos = mb_strpos($haystack, $needle, $pos + $replacement_len);
    }
    return $haystack; 
} 
?>

credit for this last function: http://www.php.net/manual/en/ref.mbstring.php#86120

score 0 · Answer 10 · answered Jul 31 '17 at 16:18

0

You can get the char ascii code with ord then replace it with your desired character:

$asciicode = ord('’'); // 146
$stringfixed = str_replace(chr($asciicode), '\'', $string);

answered Jul 31 '17 at 16:18

Emanuel A.

112
1
8

Replacing ’ character in PHP

10 Answers10