0

I have a one-time URL encoded string:

$encodedJson = "%5B%7B%0A%09%22base%22%3A%20%7B%0A%09%09%22url%22%3A%20%22abc.com%22%2C%0A%09%09%22referrer%22%3A%20%22xyz.com%22%0A%09%7D%0A%7D%2C%20%7B%0A%09%22client%22%3A%20%7B%0A%09%09%22Pixel%22%3A%20false%2C%0A%09%09%22screen%22%3A%20%221680x1050%22%0A%09%7D%0A%7D%5D"

If I use the following functions, I have a decoded JSON, which is an array:

$decodedJsonArray = json_decode(rawurldecode($encodedJson), true);

Then print_r($decodedJsonArray); gives me the desired output:

Array
(
    [0] => Array
        (
            [base] => Array
                (
                    [url] => abc.com
                    [referrer] => xyz.com
                )

        )

    [1] => Array
        (
            [client] => Array
                (
                    [Pixel] => 
                    [screen] => 1680x1050
                )

        )

)

Now, let's say I have a multiple-times URL encoded string:

$encodedJson = "%25255B%25257B%25250A%252509%252522base%252522%25253A%252520%25257B%25250A%252509%252509%252522url%252522%25253A%252520%252522abc.com%252522%25252C%25250A%252509%252509%252522referrer%252522%25253A%252520%252522xyz.com%252522%25250A%252509%25257D%25250A%25257D%25252C%252520%25257B%25250A%252509%252522client%252522%25253A%252520%25257B%25250A%252509%252509%252522Pixel%252522%25253A%252520false%25252C%25250A%252509%252509%252522screen%252522%25253A%252520%2525221680x1050%252522%25250A%252509%25257D%25250A%25257D%25255D"

This string is three-times URL encoded. Now, I want to achieve the same JSON array as before. I am trying to write a function similar to the following:

function recursiveJsonDecode($encodedJson) {
    if (isJson($encodedJson)) {
        return $encodedJson;
    } else {
        $decodedJsonArray = json_decode(rawurldecode($encodedJson), true);
        return $decodedJsonArray;
    }
}

But it's not working. Any help would be much appreciated.

AbraCadaver
  • 78,200
  • 7
  • 66
  • 87
sumion
  • 137
  • 1
  • 2
  • 14
  • What exactly is not working in the second example? show us the output you get. – Filnor Nov 29 '17 at 15:42
  • I didn't know PHP had a function called `isJson` – Andreas Nov 29 '17 at 15:43
  • 6
    A very important feature of a recursive function is that it calls itself. – Don't Panic Nov 29 '17 at 15:43
  • @DontPanic a function that doesn't is called a deaf-mute recursive function – Andreas Nov 29 '17 at 15:45
  • 2
    This is just a theoretical exercise, I assume? Because in reality you rather shouldn’t be dealing with data that was encoded more than once to begin with ... this would rather be a reason to _reject_ the data IMHO, and tell whoever is sending it to you to get _their_ stuff in order ... – CBroe Nov 29 '17 at 15:47
  • @Andreas, I have a `isJson` function in the same file, which works. `function isJson($string) { json_decode($string); return (json_last_error() == JSON_ERROR_NONE); }` – sumion Nov 29 '17 at 15:53
  • @CBroe yes, it's just an assumption. Normally it's encoded only once. But in case it's encoded multiple times ... – sumion Nov 29 '17 at 16:39

4 Answers4

3

Shipping & Post Office Supplies | USPS.com - Postal Store

Ordering shipping supplies is cute because it's the only time you can ever receive a box full of nothing but boxes!

When you receive your boxes in the mail, what do you do with them tho? I remove only the outermost packaging and set my boxes on a shelf; maybe I'll use them to send things out later. Someone that writes a recursive JSON decoder might do something differently tho – they might attempt to open all of those boxes and be sad to find out they received nothing!

"I opened every single box and I never found my order's contents!" the recursive JSON decoder laments


Don't decode it just because you can

There's no way to determine if a string is JSON encoded or not. Because of this, it's not the consumer's job to decide whether to parse or not.

Take for example, the JSON string, "5" – is it a single-encoded string of '5'?

json_encode("5");
// => '"5"'

or is it a double-encoded integer 5?

json_encode(json_encode(5));
// => '"5"'

If you're looking at the only the JSON-encoded result, there's no way to tell, but 5 (int) and "5" (string) are as different as [5] or {value: 5} – they're completely different types – The JSON consumer must know how many times the value has been encoded. That's not complicated, as you should avoid double-encoding in the first place.


When we decode JSON, we only do it once

json_decode('"5"');
// => "5"

Your recursive function would effectively do this

json_decode(json_decode('"5"'));
// => 5

Only one of those is a valid answer – this is why you see all isJson functions built around error-checking a decode operation – people trick themselves into thinking that just because you can decode a string, that it was JSON in the first place.

Returning to our USPS example, it would mean you only stop opening boxes once you encounter a thing that cannot be opened – I just keep opening boxes and once I discover they're all empty, I'm stuck wondering where my order contents are.

This idea that you can somehow detect when to stop decoding is broken from the start – In this example, watch what happens when I have a simple form submission and a recursive JSON decoder being used to process the submission...

If I fill in a form with my name "[]", and now you use a recursive JSON decoder on the submitted form data, you will end up with

$formData == [ "name" => [] ] // name is an array, wups!

Whereas a non-recursive JSON decoder would keep the name as a string

$formData == [ "name" => "[]" ] // name is a string, as the user typed

Just because you can parse it, doesn't mean you should


Once a string has been double or triple encoded (URL-encoded, JSON-encoded, or whatever-encoded)- the only way to reverse it is to decode it exactly the same amount of times

Mulan
  • 129,518
  • 31
  • 228
  • 259
  • Then why are all of his code samples in php, and there's a php tag on his question? – Ethan Nov 29 '17 at 15:57
  • The question is pretty vague. I took it to be about url encoding and not specifically about JSON. – apokryfos Nov 29 '17 at 15:58
  • The questions is not vague imo – the person is how to recursively decode JSON and shows a function with their attempt to do so – Mulan Nov 29 '17 at 16:00
  • @naomik I think the first line of your answer is wrong, see this SO post for how to detect JSON data: https://stackoverflow.com/q/6041741/3088508 – Ethan Nov 29 '17 at 16:06
  • David, just because a string *can* be JSON-decoded does not mean it's JSON – Mulan Nov 29 '17 at 16:16
  • @David, also note every solution there depends on error checking which is not a reliable signal for whether a string should be parsed or not – read the comments from other users, too – unsurprisingly, there's is no accepted answer either – i'm ready for you to remove your down-vote whenever you can set aside your pride – Mulan Nov 29 '17 at 16:38
1

You can treat URL decoding as a fixed point operation:

function fixedPointDecode($string) {
     $decoded = urldecode($string); 
     while ($decoded != $string) {
         $string = $decoded;
         $decoded = urldecode($string);    
     }
     return $decoded;
}

The idea is that if the result of urldecode does not change the original string then it's fully decoded.

Then you can do:

 json_decode(fixedPointDecode($string));

Note: I have not found any indication that there are any URL encoded strings that do not converge to a fixed point but I'm curious if anyone else has.

apokryfos
  • 38,771
  • 9
  • 70
  • 114
  • The only problem with this is sometimes you will encounter a valid value that also *appears* to be encoded. Say I have a device ID of "frt%235" – that's the *actual* device ID – if I let a program decide when to stop decoding, it would change that to `frt#5`, which is invalid in this case – Mulan Nov 29 '17 at 16:42
  • If you change the while to `while (json_decode($decoded) == null)` then this might work but in that case the function will cause an infinite loop if there's no valid json string hidden in the input. I recommend you solve the problem at its source by ensuring the string is only `urlencode`d once – apokryfos Nov 29 '17 at 16:47
  • but that's the point, the consumer can't control the contents of the data – it's broken to rely upon null/error-checks as a signal for whether a string should be parsed – Mulan Nov 29 '17 at 16:49
  • Well, that's partially true, but this is also what you get for choosing buggy providers. Getting a string that's urlencoded multiple times should merit a bug report to the provider and the provider should not claim that it's working as intended. In no world should double encoding the entire string make any sense. – apokryfos Nov 29 '17 at 17:01
  • i completely agree - so in the OP’s case if its double-encoded (for whatever dumb reason), just double decode – if it’s triple encoded for an even dumber reason, just triple decode – recursion and *detection* for a stopping point is not necessary *or* possible – Mulan Nov 29 '17 at 17:04
  • 1
    @naomik If you know how many times it's encoded in advance then that's a much better strategy than a fixed point operation, definitely. – apokryfos Nov 29 '17 at 17:23
1

json_decode will return null if it's not valid JSON as it says here:

NULL is returned if the json cannot be decoded or if the encoded data is deeper than the recursion limit.

So just test it:

while(($decodedJsonArray = json_decode($encodedJson, true)) === null) {
    $encodedJson = rawurldecode($encodedJson);
}

print_r($decodedJsonArray);

To use your isJson function:

while(!isJson($encodedJson)) {
    $encodedJson = rawurldecode($encodedJson);
}
$decodedJsonArray = json_decode($encodedJson, true);

print_r($decodedJsonArray);
Ethan
  • 4,295
  • 4
  • 25
  • 44
AbraCadaver
  • 78,200
  • 7
  • 66
  • 87
  • Wow, that's a lot cleaner than my recursive function. Have an upvote! – Ethan Nov 29 '17 at 16:07
  • 1
    To go with some random downvote :( – AbraCadaver Nov 29 '17 at 16:09
  • Hi, as I commented to David's answer, your function is also working if I have a string like `$encodedJson = "%25etcetc";` But it's not working when I parse the string from a file i.e. `$encodedJson = file_get_contents("test.txt");` or `$encodedJson = file_get_contents("test.json");` . Any idea why? – sumion Nov 29 '17 at 16:27
  • AbraCadavar, you just ordered 1 case of [Priority Mail Shoe Box](https://store.usps.com/store/product/shipping-supplies/priority-mail-shoe-box-P_0_SHOEBOX) from USPS.com – how many boxes will you open when you receive your order? – Mulan Nov 29 '17 at 17:19
0

Calling rawurldecode(rawurldecode(rawurldecode($encodedJson))) reveals that your string is actually rawurldecoded 3 times, not json_encoded 3 times, so I made the recursive function rawurldecode it on every iteration until the json_decode worked:

$encodedJson = "%25255B%25257B%25250A%252509%252522base%252522%25253A%252520%25257B%25250A%252509%252509%252522url%252522%25253A%252520%252522abc.com%252522%25252C%25250A%252509%252509%252522referrer%252522%25253A%252520%252522xyz.com%252522%25250A%252509%25257D%25250A%25257D%25252C%252520%25257B%25250A%252509%252522client%252522%25253A%252520%25257B%25250A%252509%252509%252522Pixel%252522%25253A%252520false%25252C%25250A%252509%252509%252522screen%252522%25253A%252520%2525221680x1050%252522%25250A%252509%25257D%25250A%25257D%25255D";

function recursiveJsonDecode ($inJson) {
    $outputArr = json_decode($inJson);
    if (json_last_error() == JSON_ERROR_NONE) {
        return $outputArr;
    } else {
        return recursiveJsonDecode(rawurldecode($inJson));
    }
}

print_r(recursiveJsonDecode($encodedJson));

eval.in demo

Ethan
  • 4,295
  • 4
  • 25
  • 44
  • this cannot work when applied to any generic datum – Mulan Nov 29 '17 at 15:56
  • The way the OP's input data is, I'd say that it would work, if it was screwed up in the same way as his input array was (`json_encode`d first, then `rawurlencode`d). See here: https://eval.in/910031 – Ethan Nov 29 '17 at 16:05
  • Hi, you function is working if I have a string like `$encodedJson = "%25etcetc";` But it's not working when I parse the string from a file i.e. `$encodedJson = file_get_contents("test.txt");` or `$encodedJson = file_get_contents("test.json");` . Any idea why? – sumion Nov 29 '17 at 16:21
  • @IqbalNazir It's working for me when I get `$encodedJson` from an array, but because I don't actually know what's in your `test.txt` file, I'm just assuming it's what you posted in your answer. See this eval.in: https://eval.in/910041 Please post what's in your `test.txt` file to a service like www.pastebin.com if my assumption is incorrect. – Ethan Nov 29 '17 at 16:27
  • 1
    Thanks man. It's actually working. I had " " symbol in my text file. After removing them, it's working. – sumion Nov 29 '17 at 16:32
  • @IqbalNazir you think it's "working" now only because you can't see how easily it breaks as soon as the data changes – Mulan Nov 29 '17 at 16:51
  • David, you just ordered 1 case of [Priority Mail Shoe Box](https://store.usps.com/store/product/shipping-supplies/priority-mail-shoe-box-P_0_SHOEBOX) from USPS.com – how many boxes will you open when you receive your order? – Mulan Nov 29 '17 at 17:18