12

I am stuck in an impossible situation. I have a JSON from outer space (there is no way they are going to change it). Here is the JSON

{
    user:'180111',
    title:'I\'m sure "E pluribus unum" means \'Out of Many, One.\' \n\nhttp://en.wikipedia.org/wiki/E_pluribus_unum.\n\n\'',
    date:'2007/01/10 19:48:38',
    "id":"3322121",
    "previd":112211,
    "body":"\'You\' can \"read\" more here [url=http:\/\/en.wikipedia.org\/?search=E_pluribus_unum]E pluribus unum[\/url]'s. Cheers \\*/ :\/",
    "from":"112221",
    "username":"mikethunder",
    "creationdate":"2007\/01\/10 14:04:49"
}

"It is nowhere near a valid JSON",I said. And their response was "emmm! but Javascript can read it without complain":

<html>
<script type="text/javascript">
    var obj = {"PUT JSON FROM UP THERE HERE"};

    document.write(obj.title);
    document.write("<br />");
    document.write(obj.creationdate + " " + obj.date);
    document.write("<br />");
    document.write(obj.body);
    document.write("<br />");
</script>
<body>
</body>
</html>

Problem

I am supposed to read and parse this string via .NET(4) and it broke 3 out of 14 library mentioned in C# section of Json.org (didn't try rest of them). To make the problem go away, I wrote following function to fix the issue with single and double quotes.

public static string JSONBeautify(string InStr){
    bool inSingleQuote = false;
    bool inDoubleQuote = false;
    bool escaped = false;

    StringBuilder sb = new StringBuilder(InStr);
    sb = sb.Replace("`", "<°)))><"); // replace all instances of "grave accent" to "fish" so we can use that mark later. 
                                        // Hopefully there is no "fish" in our JSON
    for (int i = 0; i < sb.Length; i++) {
        switch (sb[i]) {

            case '\\':
                if (!escaped)
                    escaped = true;
                else 
                    escaped = false;
                break;
            case '\'':
                if (!inSingleQuote && !inDoubleQuote) {
                    sb[i] = '"';            // Change opening single quote string markers to double qoute
                    inSingleQuote = true;
                } else if (inSingleQuote && !escaped) {
                    sb[i] = '"';            // Change closing single quote string markers to double qoute
                    inSingleQuote = false;
                } else if (escaped) {
                    escaped = false;
                }
                break;
            case '"':
                if (!inSingleQuote && !inDoubleQuote) {
                    inDoubleQuote = true;   // This is a opening double quote string marker
                } else if (inSingleQuote && !escaped) {
                    sb[i] = '`';            // Change unescaped double qoute to grave accent
                } else if (inDoubleQuote && !escaped) {
                    inDoubleQuote = false; // This is a closing double quote string marker
                } else if (escaped) {
                    escaped = false;
                }
                break;
            default:
                escaped = false;
                break;
        }
    }
    return sb.ToString()
        .Replace("\\/", "/")        // Remove all instances of escaped / (\/) .hopefully no smileys in string
        .Replace("`", "\\\"")       // Change all "grave accent"s to escaped double quote \"
        .Replace("<°)))><", "`")   // change all fishes back to "grave accent"
        .Replace("\\'","'");        // change all escaped single quotes to just single quote
}

Now JSONlint only complains about attribute names and I can use both JSON.NET and SimpleJSON libraries to parse above JSON.

Question

I am sure my code is not the best way of fixing mentioned JSON. Is there any scenario that my code might break? Is there a better way of doing this?

AaA
  • 3,600
  • 8
  • 61
  • 86
  • That JSON is so wrong on so many levels. However we can fix it. – Mouser Feb 07 '15 at 14:51
  • 2
    I totally agree with you, but as they are from outer space, they don't speak our language and making them understand it is wrong is ... well impossible. – AaA Feb 07 '15 at 14:53

3 Answers3

8

You need to run this through JavaScript. Fire up a JavaScript parser in .net. Give the string as input to JavaScript and use JavaScript's native JSON.stringify to convert:

obj = {
    "user":'180111',
    "title":'I\'m sure "E pluribus unum" means \'Out of Many, One.\' \n\nhttp://en.wikipedia.org/wiki/E_pluribus_unum.\n\n',
    "date":'2007/01/10 19:48:38',
    "id":"3322121",
    "previd":"112211",
    "body":"\'You\' can \"read\" more here [url=http:\/\/en.wikipedia.org\/?search=E_pluribus_unum]E pluribus unum[\/url]'s. Cheers \\*/ :\/",
    "from":"112221",
    "username":"mikethunder",
    "creationdate":"2007\/01\/10 14:04:49"
}

console.log(JSON.stringify(obj));
document.write(JSON.stringify(obj));

Please remember that the string (or rather object) you've got isn't valid JSON and can't be parsed with a JSON library. It needs to be converted to valid JSON first. However it's valid JavaScript.

To complete this answer: You can use JavaScriptSerializer in .Net. For this solution you'll need the following assemblies:

  • System.Net
  • System.Web.Script.Serialization

    var webClient = new WebClient();
    string readHtml = webClient.DownloadString("uri to your source (extraterrestrial)");
    var a = new JavaScriptSerializer();
    
    Dictionary<string, object> results = a.Deserialize<Dictionary<string, object>>(readHtml);
    
Christian Sirolli
  • 246
  • 1
  • 6
  • 18
Mouser
  • 13,132
  • 3
  • 28
  • 54
  • Great answer. If you want to go all the way, include an example or a list of .Net json parsers (maybe even just using the simple WebBrowser?). Neat trick with the js snippets in the answer, me likey. – SimpleVar Feb 07 '15 at 15:01
  • It is a good idea to give the job to someone who knows how to do it, however any suggestion on how I run a Javascript parser in .net? Does `Javascript.NET` or `Jint` handle this java object properly? – AaA Feb 07 '15 at 15:08
  • @BobSort, take a look at the updated answer. This will parse the horrible JSONish object and spit out a nice *.Net* Dictionary list. I tried it with your source and it worked. – Mouser Feb 07 '15 at 15:44
  • Nice job with both part of your answer. specially I like the inline run snippet. by the way, your second part of code does not handle all type of JSON such as arrays `[...]` – Bistro Feb 07 '15 at 16:34
  • Well I can't use Dictionary, the JSON mentioned above is part of a bigger object array. However if I use object instead of Dictionary I can get results. – AaA Feb 07 '15 at 16:40
  • Well the dictionary is an example. Reading your comment I guess you're back on the right track. – Mouser Feb 07 '15 at 16:44
2

How about this:

 string AlienJSON = "your alien JSON";
 JavaScriptSerializer js = new JavaScriptSerializer();
 string ProperJSON = js.Serialize(js.DeserializeObject(AlienJSON));

Or just consume the object after deserialize instead of converting it back to string and passing it to a JSON parser for extra headache

As Mouser also mentioned you need to use System.Web.Script.Serialization which is available by including system.web.extensions.dll in your project and to do that you need to change Target framework in project properties to .NET Framework 4.

EDIT

Trick to consume deserialized object is using dynamic

JavaScriptSerializer js = new JavaScriptSerializer();
dynamic obj = js.DeserializeObject(AlienJSON);

for JSON in your question simply use

string body = obj["body"];

or if your JSON is an array

if (obj is Array) {
    foreach(dynamic o in obj){
        string body = obj[0]["body"];
        // ... do something with it
    }
}
Community
  • 1
  • 1
Bistro
  • 1,915
  • 2
  • 14
  • 12
  • How I can consume the object after deserialize? – AaA Feb 07 '15 at 16:43
  • Have you tried to put the JavaScript string inside a .Net string? It will not work. You need to load it externally. Hence the webclient. – Mouser Feb 08 '15 at 07:26
0

here's a function I made that will fix broken json:

function fixJSON(json){
    function bulkRegex(str, callback){
        if(callback && typeof callback === 'function'){
            return callback(str);
        }else if(callback && Array.isArray(callback)){
            for(let i = 0; i < callback.length; i++){
                if(callback[i] && typeof callback[i] === 'function'){
                    str = callback[i](str);
                }else{break;}
            }
            return str;
        }
        return str;
    }
    if(json && json !== ''){
        if(typeof json !== 'string'){
            try{
                json = JSON.stringify(json);
            }catch(e){return false;}
        }
        if(typeof json === 'string'){
            json = bulkRegex(json, false, [
                str => str.replace(/[\n\t]/gm, ''),
                str => str.replace(/,\}/gm, '}'),
                str => str.replace(/,\]/gm, ']'),
                str => {
                    str = str.split(/(?=[,\}\]])/g);
                    str = str.map(s => {
                        if(s.includes(':') && s){
                            let strP = s.split(/:(.+)/, 2);
                            strP[0] = strP[0].trim();
                            if(strP[0]){
                                let firstP = strP[0].split(/([,\{\[])/g);
                                firstP[firstP.length-1] = bulkRegex(firstP[firstP.length-1], false, p => p.replace(/[^A-Za-z0-9\-_]/, ''));
                                strP[0] = firstP.join('');
                            }
                            let part = strP[1].trim();
                            if((part.startsWith('"') && part.endsWith('"')) || (part.startsWith('\'') && part.endsWith('\'')) || (part.startsWith('`') && part.endsWith('`'))){
                                part = part.substr(1, part.length - 2);
                            }
                            part = bulkRegex(part, false, [
                                p => p.replace(/(["])/gm, '\\$1'),
                                p => p.replace(/\\'/gm, '\''),
                                p => p.replace(/\\`/gm, '`'),
                            ]);
                            strP[1] = ('"'+part+'"').trim();
                            s = strP.join(':');
                        }
                        return s;
                    });
                    return str.join('');
                },
                str => str.replace(/(['"])?([a-zA-Z0-9\-_]+)(['"])?:/g, '"$2":'),
                str => {
                    str = str.split(/(?=[,\}\]])/g);
                    str = str.map(s => {
                        if(s.includes(':') && s){
                            let strP = s.split(/:(.+)/, 2);
                            strP[0] = strP[0].trim();
                            if(strP[1].includes('"') && strP[1].includes(':')){
                                let part = strP[1].trim();
                                if(part.startsWith('"') && part.endsWith('"')){
                                    part = part.substr(1, part.length - 2);
                                    part = bulkRegex(part, false, p => p.replace(/(?<!\\)"/gm, ''));
                                }
                                strP[1] = ('"'+part+'"').trim();
                            }
                            s = strP.join(':');
                        }
                        return s;
                    });
                    return str.join('');
                },
            ]);
            try{
                json = JSON.parse(json);
            }catch(e){return false;}
        }
        return json;
    }
    return false;
}
SwiftNinjaPro
  • 787
  • 8
  • 17