I'm making a cURL request to a third party website which returns a text file on which I need to do a few string replacements to replace certain characters by their html entity equivalents e.g I need to replace í
by í
.
Using string_replace/preg_replace_callback
on the response directly didn't result in matches (whether searching for í
directly or using its hex code \x00\xED
), so I used utf8_encode()
before carrying out the replacement. But utf8_encode
replaces all the í
characters by Ã
.
Why is this happening, and what's the correct approach to carrying out UTF-8 replacements on an arbitrary piece of text using php?
*edit - some further research reveals
utf8_decode("í") == í;
utf8_encode("í") == ÃÂ;
utf8_encode("\xc3\xad") == ÃÂ;