The function json_encode
requires a valid UTF-8 string. I have a string that may be in a different encoding. I need to ignore or substitute all invalid characters to be able to convert to JSON.
- It should be something very simple and robust.
- The error is in a module for manual checking, so mojibake is fine.
- The code responsible for fixing encoding is in a different module. (It was broken, though.) I don’t want to duplicate responsibility.
The hexadecimal representation of an example of an invalid string: 496e76616c6964206d61726b2096
My current solution:
$raw_str = hex2bin('496e76616c6964206d61726b2096');
$sane_str = @\iconv('UTF-8', 'UTF-8//IGNORE', $raw_str);
The three problems with my code:
- The
iconv
looks little too heavy. - Many programmers don't like
@
. - The
iconv
may ignore too much: the whole string.
Any better idea?
There is similar question, Ensuring valid UTF-8 in PHP, but I don't care about conversion.