1

I'm trying to find simple key-value-pairs in strings, given as JSON-objects, while using preg_replace_callback().

Unfortunately, the values given can be of type string, number, boolean, null, array - and worst of all - objects. My own attempts solving this problem resulted in either an incomplete selection or over-selecting multiple JSON occurances as one.

Here the things i tried:

String:
text text {"key":{"key":"value"}} text

Regex:
\{"(.+?)"\:(.+?)\}

Match:
{"key":"value"

Above: This ignores the inner }-bracket

String:
text text {"key":{"key":"value"}} text

Regex:
\{"(.+?)"\:(.+)\}

Match:
{"key":"value"}

Above: This would (theoretically) work, but when having multiple JSON occurances, i get:

{"key":"value"}} {"key":{"key":"value"}

Next attempt:

String:
text text {"key":{"key":"value"}} {"key":{"key":"value"}} text

Regex:
\{"(.+?)"\:(?:(\{(?:.+?)\})|(?:")?(.+?)(?:")?)\}

Match:
{"key":"value"}

Above: Again, that would theoreticcally work. But when taking, for example, the following string:

text text {"key":{"key":{"key":"value"}}} text

The result is...

{"key":{"key":"value"}

Missing one bracket

  • 2
    Wouldn't it just be easier to parse the JSON and then check the key/value pairs? – John Conde Feb 26 '21 at 02:12
  • @JohnConde Sadly not, these json-snippets appear in the middle of regular text. They're meant to be filled dynamic variables. – shuunenkinenbi Feb 26 '21 at 02:23
  • 1
    This is one of those questions where abstracting away your logic to basics like `key` & `value` can hide important details as to what exactly you're trying to accomplish. Instead of just saying "I need to find a key value in JSON", maybe include what you want to do with it afterwards, and why the dynamic nature is important. Without extra details, my instinct is to suggest just matching any `{ ... }` string, try parsing it as JSON via `json_decode()`, and then handle the rest in PHP. – Aken Roberts Feb 26 '21 at 02:59
  • 1
    You can iterate through the keys and still do a regular expression match/replace on the values. If you parsed it first then iterated through the keys then you wouldn't have to fight with handling the braces and double-quotes. – Hayden Feb 26 '21 at 02:59
  • How can you consistently/reliably detect what's JSON in the broader string and what's not? More specifically, how exactly do you plan on parsing something like this if the "surrounding text" contains JSON characters (`{[]}`)? – esqew Feb 26 '21 at 04:17

2 Answers2

4

PCRE supports recursive matching for that kind of nested structures. Here is a demo:

$data = 'text text 
  {"key":{"key":"value{1}","key2":false}} 
  {"key":{"key":"value2"}} 
  {"key":{"key":{"key":"value3"}}} text';

$pattern = '(
    \{ # JSON object start
        ( 
            \s*
            "[^"]+"                  # key
            \s*:\s*                  # colon
            (
                                     # value
                (?: 
                    "[^"]+" |        # string
                    \d+(?:\.\d+)? |  # number
                    true |
                    false |
                    null
                ) | 
                (?R)                 # pattern recursion
            )
            \s*
            ,?                       # comma
        )* 
    \} # JSON object end
)x';
preg_replace_callback(
    $pattern,
    function ($match) {
        var_dump(json_decode($match[0]));
    },
    $data
);
ThW
  • 19,120
  • 3
  • 22
  • 44
  • Nice work! I was starting to look at the [Recursive patterns documentation](https://www.php.net/manual/en/regexp.reference.recursive.php), but hadn't yet learned how to use recursion in a regex context. Thanks for the example. – summea Feb 27 '21 at 02:46
1

With the additional requirements of using preg_replace_callback() and not knowing the depth of the json objects ahead of time, perhaps this is another possible approach (more information on {1,} here):

<?php

// ref: https://stackoverflow.com/q/66379119/1167750
$str = 'text text {"key":{"key":"value1"}} {"key":{"key":"value2"}} {"key":{"key":{"key":"value3"}}} text';

function callback($array) {
    // Your function here...
    print_r($array);
    echo "Found:\n";
    echo "{$array[0]}\n";
}


preg_replace_callback('/\{"(.+?)"\:(.+?)\}{1,}/', 'callback', $str);

?>

Output (PHP 7.3.19):

$ php q18.php
Array
(
    [0] => {"key":{"key":"value1"}}
    [1] => key
    [2] => {"key":"value1"
)
Found:
{"key":{"key":"value1"}}
Array
(
    [0] => {"key":{"key":"value2"}}
    [1] => key
    [2] => {"key":"value2"
)
Found:
{"key":{"key":"value2"}}
Array
(
    [0] => {"key":{"key":{"key":"value3"}}}
    [1] => key
    [2] => {"key":{"key":"value3"
)
Found:
{"key":{"key":{"key":"value3"}}}

Previous idea:

Would something like this be helpful for your use case(s)?

<?php

// ref: https://stackoverflow.com/q/66379119/1167750
$str = 'text text {"key":{"key":"value1"}} {"key":{"key":"value2"}} {"key":{"key":{"key":"value3"}}} text';

preg_match_all('/\{"(.+?)"\:(.+?)\}{1,3}/', $str, $matches);

print_r($matches);

echo "Found:\n";
print_r($matches[0]);

?>

Output (PHP 7.3.19):

$ php q18.php
Array
(
    [0] => Array
        (
            [0] => {"key":{"key":"value1"}}
            [1] => {"key":{"key":"value2"}}
            [2] => {"key":{"key":{"key":"value3"}}}
        )

    [1] => Array
        (
            [0] => key
            [1] => key
            [2] => key
        )

    [2] => Array
        (
            [0] => {"key":"value1"
            [1] => {"key":"value2"
            [2] => {"key":{"key":"value3"
        )

)
Found:
Array
(
    [0] => {"key":{"key":"value1"}}
    [1] => {"key":{"key":"value2"}}
    [2] => {"key":{"key":{"key":"value3"}}}
)

If you knew ahead of time that the maximum depth these nested structures might be, you can adjust the {1,3} part ahead of time to a different setting. For example: {1,4}, {1,5}, etc. More information on that part can be found in the documentation here.

summea
  • 7,390
  • 4
  • 32
  • 48
  • 1
    Sadly not. I should've probably noted in my initial post, that its supposed to be used with preg_replace_callback(). Also. The depth of the json-objects varies and can't be determined beforehand. – shuunenkinenbi Feb 26 '21 at 02:50
  • Hi @shuunenkinenbi, Thanks for letting me know about those factors. I edited my post to hopefully address those factors. Does this get closer to the result you are looking for in your situation? – summea Feb 26 '21 at 04:08
  • Sadly not. The grouping of the brackets fails, if one of the nested objects has more than one key-value-pair. Example: {"key":{"key":{"key1":"value1","key2":"value2"}}}. Despite it not being an ideal solution, i've found a workaround using json_decode instead, as recommended by @AkenRoberts in another comment. Paired with the expression taken from this answer: https://stackoverflow.com/a/21995025/8396285 – shuunenkinenbi Feb 26 '21 at 04:25