4

I'm pulling a JSON feed that is invalid JSON. It's missing quotes entirely. I've tried a few things, like explode() and str_replace(), to get the string looking a little bit more like valid JSON, but with an associate JSON string inside, it generally gets screwed up.

Here's an example:

id:43015,name:'John Doe',level:15,systems:[{t:6,glr:1242,n:'server',s:185,c:9}],classs:0,subclass:5

Are there any JSON parsers for php out there that can handle invalid JSON like this?

Edit: I'm trying to use json_decode() on this string. It returns nothing.

hookedonwinter
  • 12,436
  • 19
  • 61
  • 74
  • 1
    i dont believe numbers need quotes in JSON – Scott M. Oct 15 '09 at 21:31
  • But the "keys" do, don't they? Like id:43015 should be "id":43015, right? – hookedonwinter Oct 15 '09 at 21:37
  • Yes, the problem is that the key names like "id" are not quoted – Julien Roncaglia Oct 15 '09 at 21:39
  • Additionally single quotes around strings are not allowed in JSON – Julien Roncaglia Oct 15 '09 at 21:39
  • 1
    You are right. Only solution I see is the patch of one of the available parsers. – Andrejs Cainikovs Oct 15 '09 at 21:39
  • Looks pretty valid to me. Ran it through a online parser which was able to parse the string (http://json.parser.online.fr/) – Ben Rowe Oct 15 '09 at 21:40
  • JSON != javascript, see http://json.org/ it's only a subset and it seem that the OP's web service is serving real javascript. – Julien Roncaglia Oct 15 '09 at 21:42
  • @Ben You'll notice that your tool says "Malformed JSON" – Justin Johnson Oct 15 '09 at 21:42
  • Just to clarify, I'm pulling this JSON string from an outside source, and am attempting to parse it with PHP, not JavaScript. Hope that helps. – hookedonwinter Oct 15 '09 at 21:45
  • It's worth saying that if you're pulling invalid JSON from a third party, then it is the third party's developers who have screwed up, not you -- there really isn't any excuse for anyone to be generating invalid JSON when it's such an easy format to get right; valid JSON can be encoded in a single line of code in pretty much every development platform out there. If they're sending invalid JSON then it means that not only have they written their own encoder, but they've got it wrong. So can you really rely on them to have got the data right and not to have any security issues? – Simba Aug 17 '15 at 11:11

7 Answers7

11
  1. All the quotes should be double quotes " and not single quotes '.
  2. All the keys should be quoted.
  3. The whole element should be an object.
    function my_json_decode($s) {
        $s = str_replace(
            array('"',  "'"),
            array('\"', '"'),
            $s
        );
        $s = preg_replace('/(\w+):/i', '"\1":', $s);
        return json_decode(sprintf('{%s}', $s));
    }
Marko
  • 3,499
  • 3
  • 24
  • 22
  • Try setting a value to a url or something with a colon in it. This will not work. (ie id:43015,name:'http:John Doe',lev ...) – KXL Oct 05 '14 at 23:04
6

This regex will do the trick

$json = preg_replace('/([{,])(\s*)([A-Za-z0-9_\-]+?)\s*:/','$1"$3":',$json);
Guy
  • 1,254
  • 17
  • 16
5

From my experience Marko's answer doesnt work anymore. For newer php versions use this istead:

$a = "{id:43015,name:'John Doe',level:15,systems:[{t:6,glr:1242,n:'server',s:185,c:988}],classs:0,subclass:5}";
$a = preg_replace('/(,|\{)[ \t\n]*(\w+)[ ]*:[ ]*/','$1"$2":',$a);
$a = preg_replace('/":\'?([^\[\]\{\}]*?)\'?[ \n\t]*(,"|\}$|\]$|\}\]|\]\}|\}|\])/','":"$1"$2',$a);
print_r($a);
  • Support for Arrays: $a = preg_replace('/(,|\{)[ \t\n]*(\w+)[ ]*:[ ]*/','$1"$2":',$a); $a = preg_replace('/(,|\[)[ \t\n]*\'?\"?(\w+)\'?\"?/','$1"$2"',$a); $a = preg_replace('/":\'?\"?([^\[\]\{\}]*?)\'?\"?[ \n\t]*(,"|\}$|\]$|\}\]|\]\}|\}|\])/','":"$1"$2',$a); – Marcos Fernandez Ramos May 22 '13 at 17:34
2

I know this question is old, but I hope this helps someone.

I had a similar problem, in that I wanted to accept JSON as a user input, but didn't want to require tedious "quotes" around every key. Furthermore, I didn't want to require quotes around the values either, but still parse valid numbers.

The simplest way seemed to be writing a custom parser.

I came up with this, which parses to nested associative / indexed arrays:

function loose_json_decode($json) {
    $rgxjson = '%((?:\{[^\{\}\[\]]*\})|(?:\[[^\{\}\[\]]*\]))%';
    $rgxstr = '%("(?:[^"\\\\]*|\\\\\\\\|\\\\"|\\\\)*"|\'(?:[^\'\\\\]*|\\\\\\\\|\\\\\'|\\\\)*\')%';
    $rgxnum = '%^\s*([+-]?(\d+(\.\d*)?|\d*\.\d+)(e[+-]?\d+)?|0x[0-9a-f]+)\s*$%i';
    $rgxchr1 = '%^'.chr(1).'\\d+'.chr(1).'$%';
    $rgxchr2 = '%^'.chr(2).'\\d+'.chr(2).'$%';
    $chrs = array(chr(2),chr(1));
    $escs = array(chr(2).chr(2),chr(2).chr(1));
    $nodes = array();
    $strings = array();

    # escape use of chr(1)
    $json = str_replace($chrs,$escs,$json);

    # parse out existing strings
    $pieces = preg_split($rgxstr,$json,-1,PREG_SPLIT_DELIM_CAPTURE);
    for($i=1;$i<count($pieces);$i+=2) {
        $strings []= str_replace($escs,$chrs,str_replace(array('\\\\','\\\'','\\"'),array('\\','\'','"'),substr($pieces[$i],1,-1)));
        $pieces[$i] = chr(2) . (count($strings)-1) . chr(2);
    }
    $json = implode($pieces);

    # parse json
    while(1) {
        $pieces = preg_split($rgxjson,$json,-1,PREG_SPLIT_DELIM_CAPTURE);
        for($i=1;$i<count($pieces);$i+=2) {
            $nodes []= $pieces[$i];
            $pieces[$i] = chr(1) . (count($nodes)-1) . chr(1);
        }
        $json = implode($pieces);
        if(!preg_match($rgxjson,$json)) break;
    }

    # build associative array
    for($i=0,$l=count($nodes);$i<$l;$i++) {
        $obj = explode(',',substr($nodes[$i],1,-1));
        $arr = $nodes[$i][0] == '[';

        if($arr) {
            for($j=0;$j<count($obj);$j++) {
                if(preg_match($rgxchr1,$obj[$j])) $obj[$j] = $nodes[+substr($obj[$j],1,-1)];
                else if(preg_match($rgxchr2,$obj[$j])) $obj[$j] = $strings[+substr($obj[$j],1,-1)];
                else if(preg_match($rgxnum,$obj[$j])) $obj[$j] = +trim($obj[$j]);
                else $obj[$j] = trim(str_replace($escs,$chrs,$obj[$j]));
            }
            $nodes[$i] = $obj;
        } else {
            $data = array();
            for($j=0;$j<count($obj);$j++) {
                $kv = explode(':',$obj[$j],2);
                if(preg_match($rgxchr1,$kv[0])) $kv[0] = $nodes[+substr($kv[0],1,-1)];
                else if(preg_match($rgxchr2,$kv[0])) $kv[0] = $strings[+substr($kv[0],1,-1)];
                else if(preg_match($rgxnum,$kv[0])) $kv[0] = +trim($kv[0]);
                else $kv[0] = trim(str_replace($escs,$chrs,$kv[0]));
                if(preg_match($rgxchr1,$kv[1])) $kv[1] = $nodes[+substr($kv[1],1,-1)];
                else if(preg_match($rgxchr2,$kv[1])) $kv[1] = $strings[+substr($kv[1],1,-1)];
                else if(preg_match($rgxnum,$kv[1])) $kv[1] = +trim($kv[1]);
                else $kv[1] = trim(str_replace($escs,$chrs,$kv[1]));
                $data[$kv[0]] = $kv[1];
            }
            $nodes[$i] = $data;
        }
    }

    return $nodes[count($nodes)-1];
}

Note that it does not catch errors or bad formatting...

For your situation, it looks like you'd want to add {}'s around it (as json_decode also requires):

$data = loose_json_decode('{' . $json . '}');

which for me yields:

array(6) {
  ["id"]=>
  int(43015)
  ["name"]=>
  string(8) "John Doe"
  ["level"]=>
  int(15)
  ["systems"]=>
  array(1) {
    [0]=>
    array(5) {
      ["t"]=>
      int(6)
      ["glr"]=>
      int(1242)
      ["n"]=>
      string(6) "server"
      ["s"]=>
      int(185)
      ["c"]=>
      int(9)
    }
  }
  ["classs"]=>
  int(0)
  ["subclass"]=>
  int(5)
}
Codesmith
  • 5,779
  • 5
  • 38
  • 50
1
$json = preg_replace('/([{,])(\s*)([A-Za-z0-9_\-]+?)\s*:/','$1"$3":',$json);// adding->(")
$json = str_replace("'",'"', $json);// replacing->(')

This solution seems to be enough for most common purposes.

Vadim Cool
  • 129
  • 10
0

I'd say your best bet is to download the source of a JSON decoder (they're not huge) and fiddle with it, especially if you know what's wrong with the JSON you're trying to decode.

The example you provided needs { } around it, too, which may help.

staticsan
  • 29,935
  • 4
  • 60
  • 73
0

This is my solution to remove trailing/leading/multi commas. It can be combined with other answers that remove single quotes and add quotes around json keys. I realize this would not be relevant to the OP as it deals with other types of invalid json however I just hope to help someone who finds this question on a google search.

function replace_unquoted_text ($json, $f)
{
  $matches = array();
  preg_match_all('/(")(?:(?=(\\\\?))\2.)*?\1/', $json, $matches, PREG_OFFSET_CAPTURE);
  //echo '<pre>' . json_encode($matches[0]) . '</pre>';
  $matchIndexes = [0];
  foreach ($matches[0] as $match)
  {
    array_push($matchIndexes, $match[1]);
    array_push($matchIndexes, strlen($match[0]) + $match[1]);
  }
  array_push($matchIndexes, strlen($json));
  $components = [];
  for ($n = 0; $n < count($matchIndexes); $n += 2)
  {
    $startIDX = $matchIndexes[$n];
    $finalExclIDX = $matchIndexes[$n + 1];
    //echo $startIDX . ' -> ' . $finalExclIDX . '<br>';
    $len = $finalExclIDX - $startIDX;
    if ($len === 0) continue;
    $prevIDX = ($n === 0) ? 0 : $matchIndexes[$n - 1];
    array_push($components, substr($json, $prevIDX, $startIDX - $prevIDX));
    array_push($components, $f(substr($json, $startIDX, $len)));
    array_push($components, substr($json, $finalExclIDX, ((($n + 1) === count($matchIndexes)) ? count($json) : $matchIndexes[$n + 1]) - $finalExclIDX));
  }
  //echo '<pre>' . json_encode($components) . '</pre>';
  return implode("", $components);
}
function json_decode_lazy ($jsonSnip) {
    return json_decode(fix_lazy_json($jsonSnip));
}

function fix_lazy_json ($json) {
    return replace_unquoted_text($json, 'fix_lazy_snip');
}
function fix_lazy_snip ($jsonSnip) {
    return remove_multi_commas_snip(remove_leading_commas_snip(remove_trailing_commas_snip($jsonSnip)));
}

function remove_leading_commas ($json) {
    return replace_unquoted_text($json, 'remove_leading_commas_snip');
}
function remove_leading_commas_snip ($jsonSnip) {
  return preg_replace('/([{[]\s*)(,\s*)*/', '$1', $jsonSnip);
}

function remove_trailing_commas ($json) {
    return replace_unquoted_text($json, 'remove_trailing_commas_snip');
}
function remove_trailing_commas_snip ($jsonSnip) {
  return preg_replace('/(,\s*)*,(\s*[}\]])/', '$2', $jsonSnip);
}

function remove_multi_commas ($json) {
    return replace_unquoted_text($json, 'remove_multi_commas_snip');
}
function remove_multi_commas_snip ($jsonSnip) {
  return preg_replace('/(,\s*)+,/', ',', $jsonSnip);
}

json_decode_lazy('[,,{,,,"a":17,,, "b":13,,,,},,,]') // {"a":17, "b":13}

See on repl.it.

trinalbadger587
  • 1,905
  • 1
  • 18
  • 36