I'm currently working on a regex to replace empty HTML elements. However, the strings in the database contain hidden chars. For example, in the database I copy this string:
<h3> </h3>
When I loop over it and convert each character into an integer with ord
, I get the following output:
< => 60
h => 104
3 => 51
> => 62
=> 32
< => 60
/ => 47
h => 104
3 => 51
> => 62
However, when I read it from the database and put it into a variable directly, I get the following output:
< => 60
h => 104
3 => 51
> => 62
� => 194
� => 160
< => 60
/ => 47
h => 104
3 => 51
> => 62
I know the 160 is a non-breaking space, so I know this could be correct. However what I don't get is why I get an extra char 194 (which is  according to google).
How can I get rid of the  I get? The non-breaking space is understandable but I don't get the Â.
UPDATE:
The data in the database is stored as utf8_general_ci. I set the charset in the PDO connection to utf8.
UPDATE2:
I'm curious why I get an  (char 194) to begin with. Between
and
in the database there's one character according to my cursor.I want to remove <h3>[ONLY SPACES]</h3>
but because it contains a random char 194 I cannot replace it correctly with regex since 194 isn't a space.