2

I am unable to use str_replace a single 'smart' quote (’) but only if it's passed as a $_POST request.

The exact problem is that my client is copying and pasting from a browser in which the quotes are rendered from ’ . When he pastes the text into the form it updates a database entry, as long as the curly quote is in the database the entire site breaks. I didn't create his site so it's a pain to try and track down the cause of this problem but I did narrow it down to copying and pasting curly quotes. Therefore my first simple solution was to just replace them as soon as they came over POST.

An example can be seen here:

http://wheatbeakinc.com/quote.php

This is the exact source code:

<div style="font-size:30px;">

<?php

if(isset($_POST["text"])){
    
    $foo = str_replace("’","'","tes’t");
    
    $chr_map = array(
   // Windows codepage 1252
   "\xC2\x82" => "'", // U+0082⇒U+201A single low-9 quotation mark
   "\xC2\x84" => '"', // U+0084⇒U+201E double low-9 quotation mark
   "\xC2\x8B" => "'", // U+008B⇒U+2039 single left-pointing angle quotation mark
   "\xC2\x91" => "'", // U+0091⇒U+2018 left single quotation mark
   "\xC2\x92" => "'", // U+0092⇒U+2019 right single quotation mark
   "\xC2\x93" => '"', // U+0093⇒U+201C left double quotation mark
   "\xC2\x94" => '"', // U+0094⇒U+201D right double quotation mark
   "\xC2\x9B" => "'", // U+009B⇒U+203A single right-pointing angle quotation mark

   // Regular Unicode     // U+0022 quotation mark (")
                          // U+0027 apostrophe     (')
   "\xC2\xAB"     => '"', // U+00AB left-pointing double angle quotation mark
   "\xC2\xBB"     => '"', // U+00BB right-pointing double angle quotation mark
   "\xE2\x80\x98" => "'", // U+2018 left single quotation mark
   "\xE2\x80\x99" => "'", // U+2019 right single quotation mark
   "\xE2\x80\x9A" => "'", // U+201A single low-9 quotation mark
   "\xE2\x80\x9B" => "'", // U+201B single high-reversed-9 quotation mark
   "\xE2\x80\x9C" => '"', // U+201C left double quotation mark
   "\xE2\x80\x9D" => '"', // U+201D right double quotation mark
   "\xE2\x80\x9E" => '"', // U+201E double low-9 quotation mark
   "\xE2\x80\x9F" => '"', // U+201F double high-reversed-9 quotation mark
   "\xE2\x80\xB9" => "'", // U+2039 single left-pointing angle quotation mark
   "\xE2\x80\xBA" => "'", // U+203A single right-pointing angle quotation mark
);
$chr = array_keys  ($chr_map); // but: for efficiency you should
$rpl = array_values($chr_map); // pre-calculate these two arrays
$bar = str_replace($chr, $rpl, html_entity_decode($_POST["text"], ENT_QUOTES, "UTF-8"));
        
        echo "foo: " . $foo . " - <em>shows straight quote (for me)</em><br /><br >";
        echo "bar: " . $bar . " - <em>still shows curly quote (for me)</em><br /><br >";    
        
}

?>


Copy this into the input: tes&rsquo;t

<form action="" method="post">

<input type="text" name="text" />
<br>
<br>
<input type="submit" value="Submit" />

</form>

</div>

if I fill in the exact same string (tes’t) in the form and hit submit, it will give the following result:

foo: tes't

bar: tes’t

Even though the strings are identical the one passed through post is not replaced. Does anyone know why this is happening?

This is not a duplicate of the other question, and that solution does not work.

Community
  • 1
  • 1
WheatBeak
  • 1,036
  • 6
  • 12

1 Answers1

0

Upon testing (and I had my doubts about it being an encoding issue; I accidentally deleted my comment about that), I was able to find out why your code is failing.

It's because your file's encoding may be set to UTF-8 without BOM.

If that is the case, change it to be with BOM (byte order mark) and it will work as expected.

Reference:


Nota:

Saving the file as ANSI encoding, did also replace the curly quote with a regular quote, so you have a choice. As ANSI, or UTF-8 with BOM.

You can use an editor such as Notepad++ for this.

From the dropdown menu, you would choose:

  • Encoding, Convert to UTF-8 with BOM, then save.
  • Or, Encoding, Convert to ANSI, then save.
  • The choice is yours.

Important sidenote: Do not choose "Encode in...", because that will not convert your file once you save it. You must choose "Convert to".

There are other code editors out there that you can use which will give you the same result.

Funk Forty Niner
  • 74,450
  • 15
  • 68
  • 141
  • Actually it appears to be the opposite of that, if I change to it UTF-8 with BOM then it works. Thanks. – WheatBeak Jan 15 '16 at 18:37
  • @WheatBeak hehe ok, I'll change it. I might have confused even myself (but that is indeed what I meant). I'm glad it worked out for you, cheers. Edit: changed. and you're welcome. – Funk Forty Niner Jan 15 '16 at 18:38