1

I'm dealing with a replacement character inside a MySQL database... and it's fine if it stays there but I'm trying to edit it. My form displays the character as a diamond shape with a question mark in it (�). So I submit the form, I compare the data between the one on the form to the one in the data to see if it has changed. The problem here is that when I submit the form it turns the replacement character into � which is the html entity equivalent so when this happens it fails the comparison and the code thinks the string has changed-- which it has, but not really. I've tried to employ different methods of either turning the replacement character into the html entity equivalent from the database when it's being compared --it starts to turn another seemingly normal characters into another replacement character html entity equivalent-- and turning the html entities into the replacement character --which simply does not work for this-- but they both fail. And yes, I have tried html_entity_decode() and htmlspecialchars_decode()

My questions is: How can I keep the replacement character from turning into an html entity?

Err
  • 890
  • 2
  • 11
  • 32
  • Does it turn into the HTML entity when it's in the PHP, or when it's been inserted into the database? – Waleed Khan Apr 09 '12 at 05:23
  • So I changed my webpage encoding to UTF-8 to make it match to the database encoding. I still can't decode the html entity. I've tried `mb_decode_numericentity($str,array(0xEF, 0xBF, 0xBD),'UTF-8');` – Err Apr 09 '12 at 21:30

2 Answers2

2

Please verify encoding on your html (for example)

<meta http-equiv="Content-Type" content="text/html; charset=<your_charset>">

and on your database (for example in MySQL)

DEFAULT CHARACTER SET <your_charset> COLLATE <your_collate>

It must be equal.

Valeriy Gorbatikov
  • 3,459
  • 1
  • 15
  • 9
  • 1
    Why must it be equal when I'm simply trying to decode the html entity back into the character in php for a comparison? I haven't even got to the point of inserting the data into the database. Just to clarify, I don't care about being able to see the character correctly, just that the PHP can use it. Page charset = iso-8859-1 db charset = utf8 – Err Apr 09 '12 at 05:36
  • 1
    *"Why must it be equal when I'm simply trying to decode the html entity back into the character in php for a comparison?"* Because inequalities are going to cause other problems later on, you'll need to make sure to convert from one representation to another everywhere, and because these days you need a **really** good reason to use anything other than unicode everywhere. – DCoder Apr 09 '12 at 07:03
1

For some reason, the webbrowser is submitting the � REPLACEMENT CHARACTER (U+FFFD) as it's decimal, numeric HTML Entitiy: &#65533;. Probably you're already outputting it that way to the browser?

However, if you expect the input to contain HTML entities, you need to decode them if you don't want to store them into your database as HTML. To decode numeric entities within an incomming UTF-8 encoded string $str:

$convmap = array (0, 0x10FFFF, 0, 0xFFFFFF);
$output = mb_decode_numericentity($str, $convmap, 'UTF-8');

This code does actually do the conversion you're looking for (Demo), however you should clarify first why a numeric HTML entity is submitted.

As you prefer unicode, I suggest you make use of UTF-8 for the webpage:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

and for the form:

<form action="" method="post" accept-charset="utf-8">

good luck.

hakre
  • 193,403
  • 52
  • 435
  • 836