4

I have a huge string dump that contains a mix of regular text and JSON. I want to seperate/remove the JSON objects from the string dump and get the text only.

Here is an example:

This is some text {'JSON':'Object'} Here's some more text {'JSON':'Object'} Yet more text {'JSON':'Object'} Again, some text.

My goal is to get a text dump that looks like this (basically the JSON is removed):

This is some text Here's some more text Yet more text Again, some text.

I need to do this all in PHP. The text dump is always random, and so is the JSON data structure (most of the it is deeply nested). The dump may or may not start with JSON, and it may or may not contain more than one JSON object within the string dump.

I have tried using json_decode on the string but the result ends up as NULL

EDIT: Amal's answer is really close to what I want (see the 2nd comment below):

$str = preg_replace('#\{.*?\}#s', '', $str);

However, it doesn't get rid of nested objects at all; e.g. data contained in brackets: [] or [{}]

Sorry, I'm not an expert in regex.

I realized that some of you may need a more concrete example of the string dump I'm dealing with; therefore I've created a gist (please note that this is not static data; the data in the dump will always be different; my example above just simplifies the string I'm working with): https://gist.github.com/anonymous/6855800

James Nine
  • 2,548
  • 10
  • 36
  • 53

3 Answers3

14

I wanted you to post the code you used in your attempt using JSON_decode but oh well...

You can use a recursive regex for nested braces in PHP:

$res = preg_replace('~\{(?:[^{}]|(?R))*\}~', '', $text);

regex101 demo (The part highlighted in blue will be removed).

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • Jerry, the code is simply: $string = "<13,000 char string dump from gist>"; $result = json_decode($string, true); // That's it. – James Nine Oct 06 '13 at 16:32
  • @JamesNine You'd be getting an error 4 with that I think, which means 'Syntax error, malformed JSON'. I guess that the `json` commands can't be used after all :( – Jerry Oct 06 '13 at 16:53
  • 1
    Works great. What if I need to get only the json string and remove anything else? – lomse Jul 14 '14 at 16:02
  • 1
    @Lomse Then it would be easier to match on JSON strings. i.e. use `preg_match_all` instead of using `preg_replace`. 'Remove everything but...' is usually a flag that it can be done in a simpler way. – Jerry Jul 14 '14 at 16:05
1

take a stack and start iterating over the string from the begining.

for($i=0;i<count($str);$i++){
}

whenver you find $str[i] == '{' push this element into the stack and initialize the start variable to $i:

$start = $i;

now whenver a { or [ occurs in th string start push into the stack. if ] or } occurs and the top of the stack is not { or ] that means this is not a correct json. if not so then pop the top of stack and keep on doing so until stack is empty.

at that point you get $end = $i;

this will be one of the json string. (from $start to $end) push this string into another array which keeps all the jsons.

and keep on processing till you reach the end

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
0

Here is a working code snippet that works based on animesh seth's answer.

if (strpos($msg, '{') !== false) {
    $msg = str_split($msg);
    // extract the json message.
    $json = '';
    $in = 0;
    foreach ($msg as $i => $char) {
        if ($char == '{') {
            $in++;
        }
        if ($in) {
            $json .= $msg[$i];
        }
        if ($char == '}') {
            $in--;
        }
    }
    if ($json) {
        $json = json_decode($json);
    }
    // do something with the json object.
}
Brian F
  • 1
  • 1