-1

I have a json and I need to match all "text" keys as well as the "html" keys.

For example, the json could be like below:

[{
"layout":12,
"text":"Lorem",
"html":"<div>Ipsum</div>"
}]

Or it could be like below:

[{
"layout":12,
"settings":{
    "text":"Lorem",
    "atts":{
        "html":"<div>Ipsum</div>"
    }
}
}]

The json is not always using the same structure so I have to match the keys and get their values using preg_match_all. I have tried the following to get the value of the "text" key:

preg_match_all('|"text":"([^"]*)"|',$json,$match_txt,PREG_SET_ORDER);

The above works fine for matching a single key. When it comes to matching a second key ("html" in this case) it just doesn't work. I have tried the following:

preg_match_all('|"text|html":"([^"]*)"|',$json,$match_txt,PREG_SET_ORDER);

Can you please give me some hints why the OR operator (text|html) doesn't work? Strangely, the above (multi-pattern) regex works fine when I test it in an online tester but it doesn't work in my php files.

brianforan
  • 184
  • 2
  • 15
otinanai
  • 3,987
  • 3
  • 25
  • 43
  • 3
    Why do you need to do this through regex? – Jim Wright Aug 11 '17 at 17:11
  • @JimWright because the json doesn't always have the same layout/structure and there's no way to control that. – otinanai Aug 11 '17 at 17:12
  • If it is a valid json then it can be parsed using `json_decode` – anubhava Aug 11 '17 at 17:16
  • Have you tried enclosing the `text|html` in parenthesis: `(text|html)`? – pchaigno Aug 11 '17 at 17:18
  • have to search it if you use `json_decode`, solution can be found here: https://stackoverflow.com/questions/19420715/check-if-specific-array-key-exists-in-multidimensional-array-php – brianforan Aug 11 '17 at 17:18
  • Even though its _not the same layout/structure_, if it's valid, can't you just search it like you are using regex, ie. `text` or `html` ? –  Aug 11 '17 at 17:18
  • You know, it's technically feasible to use regex for isolated JSON extraction. But there has to be a better reason than "regex was the first thing that came to mind". – mario Aug 11 '17 at 17:19
  • @pchaigno Yes I have tried it with no luck. – otinanai Aug 11 '17 at 17:20
  • @mario Regex is the only reliable solution in my case. – otinanai Aug 11 '17 at 17:21
  • If you insist on regex, change it to `|"(?:text|html)":"([^"]*)"|` –  Aug 11 '17 at 17:22
  • You should find out why. Not stop at observing that it didn't work. -- To answer your question: use non-capturing grouping for local alternatives. – mario Aug 11 '17 at 17:23
  • @sln The reason I don't want a 2nd regex for html is that I can't control the order of the values. Example: If I search for "text" and then for "html" I end up displaying the values of text and afterwards the values of html but the json may have one text, two html and the one more text. I need the order to be intact. – otinanai Aug 11 '17 at 17:24
  • @sln I tired your regex and it doesn't work. – otinanai Aug 11 '17 at 17:26
  • @mario I like your terminology but I was hoping for some snippet. – otinanai Aug 11 '17 at 17:27
  • I see. That wasn't a concern of mine but.. Beware though, it is possible that an html could be structure itself. Like `"html":{"a":"","text":"a link", "a":""}` –  Aug 11 '17 at 17:28
  • @otinanai Ever considered turning on error_reporting? Also, no offense, but you seem unversed with both the regex syntax and the lingo. Let me assure you, it's *not* the best approach to your perceived problem. – mario Aug 11 '17 at 17:28
  • @mario Error reporting is always on and I don't get any errors. Of course, I'm unversed with regex and its lingo otherwise I wouldn't post my question here. – otinanai Aug 11 '17 at 17:31
  • 2
    You should not be using regular expressions to parse JSON. – Rob W Aug 11 '17 at 17:31
  • I tried your `I tired your regex and it doesn't work` and it works for everybody else https://regex101.com/r/6ZgrSX/1 –  Aug 11 '17 at 17:33
  • Because, you can't say the regex doesn't work. The regex won't parse. And, that is the difference. If you don't know how to interface a regex with the language, that is another issue. –  Aug 11 '17 at 17:39
  • 1
    @sln It doesn't work in his environment because the parentheses are not the only issue. He also needs to change the delimiters. – pchaigno Aug 11 '17 at 17:46
  • @pchaigno - Who said anything about environment ? The regex works doesn't it ? What environment told him to use the pipe as a delimiter in the first place? He should know better then. –  Aug 11 '17 at 17:51

4 Answers4

3

Fixing text|html

You should add text|html to a group, otherwise it will look for "text or html".

|"(text|html)":"([^"]*)"|

Delimiters

This won't currently work with your delimiters though as you use the pipe (|) inside of the expression. You should change your delimiters to something else, here I've used /.

/"(text|html)":"([^"]*)"/

If you still want to use the pipe as your delimiters, you should escape the pipe within the expression.

|"(text\|html)":"([^"]*)"|

If you don't want to manually escape it, preg_quote() can do it for you.

$exp = preg_quote('"(text|html)":"([^"]*)"');
preg_match_all("|{$exp}|",$json,$match_txt,PREG_SET_ORDER);

Parsing JSON

Although that regex will work, it will need additional parsing and it makes more sense to use a recursive function for this.

json_decode() will decode a JSON string into the relative data types. In the example below I've passed an additional argument true which means I will get an associative array where you would normally get an object.

Once findKeyData() is called, it will recursively call itself and work through all of the data until it finds the specified key. If not, it returns null.

function findKeyData($data, $key) {
    foreach ($data as $k => $v) {
        if (is_array($v)) {
            $data = findKeyData($v, $key);
            if (! is_null($data)) {
                return $data;
            }
        }
        if ($k == $key) {
            return $v;
        }
    }
    return null;
}

$json1 = json_decode('[{
"layout":12,
    "text":"Lorem",
    "html":"<div>Ipsum</div>"
    }]', true);
$json2 = json_decode('[{
"layout":12,
    "settings":{
    "text":"Lorem",
    "atts":{
        "html":"<div>Ipsum</div>"
    }
}
}]', true);

var_dump(findKeyData($json1, 'text')); // Lorem
var_dump(findKeyData($json1, 'html')); // <div>Ipsum</div>
var_dump(findKeyData($json2, 'text')); // Lorem
var_dump(findKeyData($json2, 'html')); // <div>Ipsum</div>
Jim Wright
  • 5,905
  • 1
  • 15
  • 34
  • This answer fails to address why his regex doesn't work. He also needs to change the delimiters (`|`). Otherwise, I agree that parsing the JSON is probably a better idea in his case. – pchaigno Aug 11 '17 at 17:42
  • I've updated the answer to include more options on delimiters. – Jim Wright Aug 11 '17 at 17:50
1
preg_match_all('/"(?:text|html)":"([^"]*)"/',$json,$match_txt,PREG_SET_ORDER);

print $match_txt[0][0]." with group 1: ".$match_txt[0][1]."\n";
print $match_txt[1][0]." with group 1: ".$match_txt[1][1]."\n";

returns:

$ php -f test.php
"text":"Lorem" with group 1: Lorem
"html":"<div>Ipsum</div>" with group 1: <div>Ipsum</div>

The enclosing parentheses are needed : (?:text|html); I couldn't get it to work on https://regex101.com without. ?: means the content of the parentheses will not be captured (i.e., not available in the results).

I also replaced the pipe (|) delimiter with forward slashes since you also have a pipe inside the regex. Another option is to escape the pipe inside the regex: |"(?:text\|html)":"([^"]*)"|.

pchaigno
  • 11,313
  • 2
  • 29
  • 54
  • That's well-explained now, and mostly what OP wants. (Except for capturing `html|regex` along.) – mario Aug 11 '17 at 17:33
1

I don't see any reason to use a regex to parse a valid json string:

array_walk_recursive(json_decode($json, true), function ($v, $k) {
    if ( in_array($k, ['text', 'html']) )
        echo "$k -> $v\n";
});

demo

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
0

You use the Pipe | character as delimiter, I think this will break your regexp. Does it work using another delimiter like

preg_match_all('#"text|html":"([^"]*)"#',$json,$match_txt,PREG_SET_ORDER);

?

Philipp Gebert
  • 131
  • 1
  • 4