17

An AJAX call is returning a response text that includes a JSON string. I need to:

  1. extract the JSON string
  2. modify it
  3. then reinsert it to update the original string

I am not too worried about steps 2 and 3, but I can't figure out how to do step 1. I was thinking about using a regular expression, but I don't know how as my JSON might have multiple levels with nested objects or arrays.

Christophe
  • 27,383
  • 28
  • 97
  • 140
  • 2
    You're not new here. What have you tried? How does your response look like? – Madara's Ghost May 13 '12 at 19:24
  • Also, RegEx is probably not the correct tool for the job. – Madara's Ghost May 13 '12 at 19:25
  • @Truth my only workaround so far is to include markers in the response text to show the beginning and the end of the JSON string. Nothing to be proud of or that would guide the answer. – Christophe May 13 '12 at 19:34
  • Simply getting firstIndex of "{" literal and lastIndex of "}" literal, then `str.substring(firstIndex,lastIndex+1)` suffices for most of the messages returned by an ai like chatGPT. – Ali Mert Çakar May 05 '23 at 21:53

4 Answers4

26

You cannot use a regex to extract JSON from an arbitrary text. Since regexes are usually not powerful enough to validate JSON (unless you can use PCRE) they also cannot match it - if they could, they could also validate JSON.

However, if you know that the top-level element of your JSON is always an object or array, you can go by the following approach:

  • Find the first opening ({ or [) and last closing (} or ]) brace in your string.
  • Try to parse that block of text (including the braces) using JSON.parse(). If it succeeded, finish and return the parsed result.
  • Take the previous closing brace and try parsing that string. If it succeeds, you are done again.
  • Repeat this until you got no brace or one that comes before the current opening brace.
  • Find the first opening brace after the one from step 1. If you did not find any, the string did not contain a JSON object/array and you can stop.
  • Go to step 2.

Here is a function that extracts a JSON object and returns the object and its position. If you really need top-level arrays, too, it should be to extend:

function extractJSON(str) {
    var firstOpen, firstClose, candidate;
    firstOpen = str.indexOf('{', firstOpen + 1);
    do {
        firstClose = str.lastIndexOf('}');
        console.log('firstOpen: ' + firstOpen, 'firstClose: ' + firstClose);
        if(firstClose <= firstOpen) {
            return null;
        }
        do {
            candidate = str.substring(firstOpen, firstClose + 1);
            console.log('candidate: ' + candidate);
            try {
                var res = JSON.parse(candidate);
                console.log('...found');
                return [res, firstOpen, firstClose + 1];
            }
            catch(e) {
                console.log('...failed');
            }
            firstClose = str.substr(0, firstClose).lastIndexOf('}');
        } while(firstClose > firstOpen);
        firstOpen = str.indexOf('{', firstOpen + 1);
    } while(firstOpen != -1);
}

var obj = {'foo': 'bar', xxx: '} me[ow]'};
var str = 'blah blah { not {json but here is json: ' + JSON.stringify(obj) + ' and here we have stuff that is } really } not ] json }} at all';
var result = extractJSON(str);
console.log('extracted object:', result[0]);
console.log('expected object :', obj);
console.log('did it work     ?', JSON.stringify(result[0]) == JSON.stringify(obj) ? 'yes!' : 'no');
console.log('surrounding str :', str.substr(0, result[1]) + '<JSON>' + str.substr(result[2]));

Demo (executed in the nodejs environment, but should work in a browser, too): https://paste.aeum.net/show/81/

Community
  • 1
  • 1
ThiefMaster
  • 310,957
  • 84
  • 592
  • 636
  • Interesting... Your link is pointing to a page that says "Yes, a complete regex validation is possible"! – Christophe May 13 '12 at 22:28
  • Oh heh, didn't scroll past the accepted answer - but well, PCRE is pretty powerful. I don't think those features are available in JavaScript. – ThiefMaster May 13 '12 at 23:04
2

For others who are looking (as I was) for extracting JSON strings from text in general (even if they're not valid), you could take a look at this Gulp plugin https://www.npmjs.com/package/gulp-extract-json-like. It searches for all strings which appear to be formatted like JSON strings.

Create a folder and install packages.

mkdir project && cd project
npm install gulp gulp-extract-json-like

Create a file ./gulpfile.js and put following content into it:

var gulp = require('gulp');
var extractJsonLike = require('gulp-extract-json-like');

gulp.task('default', function () {
  return gulp.src('file.txt')
    .pipe(extractJsonLike())
    .pipe(gulp.dest('dist'));
});

Create a file called ./file.txt which contains your text and run the following command.

gulp

Found JSON strings will be in the ./dist/file.txt.

saamo
  • 21
  • 3
0

If the JSON is returned as part of an ajax response, why not use the browsers native JSON parsing (beware of gotchas)? Or jQuery JSON Parsing?

If the JSON is totally mangled up with the text, that really reeks of a design issue IMHO - if you can change it, I would strongly recommend doing so (i.e. return a single JSON object as the response, with the text as a property of the object).

If not, then using RegEx is going to be an absolute nightmare. JSON is naturally very flexible, and ensuring accurate parsing is going to be not only time-consuming, but just wasteful. I would probably put in content markers at the start/end and hope for the best. But you're going to be wide-open to validation errors etc.

Rob Cooper
  • 28,567
  • 26
  • 103
  • 142
  • Unfortunately I cannot change it. What I am getting in the response is actually a whole script that includes parameters in a JSON literal. – Christophe May 13 '12 at 22:17
  • I'm confused, since in your comment on the question, you have added markers to the start/end of the JSON string? How did you do that without being able to change the response? – Rob Cooper May 14 '12 at 12:31
  • Sorry, what I mean is that I can't prevent the JSON from being mixed with "text", text actually being a script. – Christophe May 14 '12 at 15:38
  • OK, well I see you've accepted an answer. If that's working for you then awesome, otherwise a sample response in the Q would give us a chance to create a working solution. – Rob Cooper May 15 '12 at 07:52
0

I did this my own wonky way. It's certainly not fool proof, but for improving the ability to view logs with single line JSON objects in them, this worked for me. I'm not a javascript developer so feel free to tell me why this is bad haha.

//PrettyPrint() will attempt to find JSON strings in the log message. If it finds them, it will replace the raw ugly JSON with pretty printted JSON
function PrettyPrint() {
    var jsonStrings = [];
    var prettyLogElement = document.getElementById('PrettyLogDisplayOnly');
    try {
        var rawLogMessage = $("textarea[id^='LogMessage']").val();
        if (rawLogMessage == null) {
            throw "Failed to extract original log message.";
        }

        jsonStrings = ExtractJsonStrings(rawLogMessage);
        
        var modifiedLogMessage = "<pre>" + rawLogMessage + "\"</pre>";

        for (const jsonString of jsonStrings) {
            try {
                var jsonObject = JSON.parse(jsonString);
                var prettyPrintJsonString = JSON.stringify(jsonObject, null, 2);
                modifiedLogMessage = modifiedLogMessage.replace(jsonString, prettyPrintJsonString);
            }
            catch (err) {
                modifiedLogMessage += "Failed to pretty print: " + jsonString;
            }
        }
    }
    catch (err) {
        if (err == null || err == undefined) {
            err = "Failed to parse.";
        }
        else
        {
            err = "Failed to parse. Details: " + err;
        }
        
        //TODO: instead of showing the error here, show it as an error banner?
        rawLogMessage = "<br/>Failed to beautify JSON objects. Details: " + err + " Displaying raw log message.<br/>" +
            "<br/>-------------------------------------------------------------------------------------<br/><br/>"
            + rawLogMessage;;

        prettyLogElement.innerHTML += rawLogMessage;
        return;
    }

    prettyLogElement.innerHTML = modifiedLogMessage;
}

function ExtractJsonStrings(rawLogMessage) {
    var jsonStrings = [];
    var locationOfCurrentCurly = -1;
    
    while (true) {
        var countOfOpenCurlyBraces = 0;
        var countOfClosedCurlyBraces = 0;

        var locationOfFirstUnescapedOpeningCurly = GetLocationOfNextUnescapedOpeningCurlyBrace(rawLogMessage, locationOfCurrentCurly + 1);
        if (locationOfFirstUnescapedOpeningCurly == -1) {
            break; //we found all the JSON strings
        }
        else
        {
            locationOfCurrentCurly = locationOfFirstUnescapedOpeningCurly;
            countOfOpenCurlyBraces++;
        }
        
        while (countOfOpenCurlyBraces != countOfClosedCurlyBraces)
        {
            if (countOfClosedCurlyBraces > countOfOpenCurlyBraces)
            {
                throw "Found more closing curly braces than opening curly braces.";
            }
            
            var startSearchAtIndex = locationOfCurrentCurly + 1
            locationOfCurrentCurly = GetLocationOfNextUnescapedCurlyBrace(rawLogMessage, startSearchAtIndex);
            if (locationOfCurrentCurly == -1) {
                throw "Failed to find the 'next' curly brace.";
            }

            var curly = rawLogMessage.charAt(locationOfCurrentCurly);
            if (curly === '{') {
                countOfOpenCurlyBraces++;
            } else if (curly === '}') {
                countOfClosedCurlyBraces++;
            } else {
                throw "Unknown character found when curly brace expected.";
            }
        }
        
        var possiblyCorrectlyFormattedJsonString = rawLogMessage.substring(locationOfFirstUnescapedOpeningCurly, locationOfCurrentCurly + 1);
        jsonStrings.push(possiblyCorrectlyFormattedJsonString);
    }

    return jsonStrings;
}

//this will only find the next opening brace {
function GetLocationOfNextUnescapedOpeningCurlyBrace(rawLogMessage, startIndex) {
    var regexNextUnescapedOpeningCurly = /(?<!\\)({)/i;
    return RegexStringExtract(rawLogMessage, startIndex, regexNextUnescapedOpeningCurly)
}

//this will find the next opening OR closing brace { }
function GetLocationOfNextUnescapedCurlyBrace(rawLogMessage, startIndex) {
    var regexNextUnescapedCurly = /(?<!\\)({|})/i;
    return RegexStringExtract(rawLogMessage, startIndex, regexNextUnescapedCurly)
}

function RegexStringExtract(stringToSearch, startIndex, regex) {
    var substring = stringToSearch.substring(startIndex);
    var regexMatch = regex.exec(substring);
    if (regexMatch) {
        return startIndex + regexMatch.index;
    }
    else {
        return -1;
    }
}
Jeremy
  • 1
  • 1