0

I'm working with an old system, that sends me some JSON. I was stuck with a bug for a while, where if I parsed using json_decode( $json_string, true); that it then returned null.

I found the solution here, which was to do this:

json_decode( preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $json_string), true );

I don't get why that worked, though. Googling around for those characters tell me, that it's some 'unreadable characters'. But I can't figure out what purpose they serve; why they are there.

And... It makes me in doubt, every time I'm decoding a JSON-string now, with this system, having to print_r it, to ensure that the parsed JSON isn't null (because it's not always that the JSON contains the bad characters).

And even further, then I can read in some of the comments, where I found the solution, that one should be careful removing those characters.

I'm retrieving the JSON-string, by doing this: file_get_contents( 'https://example.org/the-endpoint' );


So this leads to to three questions:

  1. What errors could I stumble into, if I always json_decoded like this?
function improved_json_decode( $json_string ){
  return json_decode( preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $json_string), true );
}
  1. Why does this happen in the first place? Does elves add the characters or what is going on?

  2. Should I (and can I) check a JSON-string, if it's "parsable"?

El_Vanja
  • 3,660
  • 4
  • 18
  • 21
Zeth
  • 2,273
  • 4
  • 43
  • 91
  • 1
    Basically, if its JSON (valid) then it will work nicely using the standard PHP functions. You can test for errors in the conversion using `json_last_err()` and `json_last_err_msg()` to see if the conversion failed. – RiggsFolly May 10 '21 at 13:20
  • 2
    3. This is exactly what [`json_decode()`](https://www.php.net/manual/en/function.json-decode) does for a living. It returns `null` in two situations: when the input string is exactly `null` (this probably does not happen in your case) and when the input string cannot be parsed for some reason. – axiac May 10 '21 at 13:20
  • 1
    As we are not really sure what issue caused you to try the preg_replace to fix, more is difficult to say. Would you let us know what the issue was? – RiggsFolly May 10 '21 at 13:21
  • 2
    My one thought was that the JSON might have a BOM at the front for some reason. – Chris Haas May 10 '21 at 13:22
  • 2) is likely a question for whoever generates the JSON that the endpoint returns. – El_Vanja May 10 '21 at 13:26

1 Answers1

0

ISSUE

The string contains "invisible" invalid characters. It's most likely caused by not storing the JSON string properly in the first place, so when it's getting retrieved it's coming with those characters included.

Usually these invalid characters will be in the ASCII table ranged between 0-32. For example a LINE FEED character (LF). This can happen for example if a user copies a text directly from a rich text processor like Word.

SOLUTION

This is what preg_replace does, it strips any ASCII characters between those ranges in the table.

preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $json_string)

In this case x00-x1F being 0-32 and x80-xFF being 128-255.

If you are dealing with information that already exists (stored in any way) then the preg_replace approach is valid if you know what you are stripping and why.

If you want to see the ASCII characters in the string (the invisible ones) you can use:

print_r(unpack("C*", $your_json_string));

This will return your string character by character.

A BETTER APPROACH

If you do have control over the inputting/storing of the data, then sanitize the string before storing it (storing unsanitized data is an unsafe practice, for obvious reasons).

You can use elegant solutions like:

filter_var($your_string, FILTER_UNSAFE_RAW, FILTER_FLAG_STRIP_LOW);
kissumisha
  • 484
  • 4
  • 12