0

I am trying to run a regex on a JSON string to verify data is as expected before continuing with my script.

Here is an example of the JSON to run the regex on:

[{"id":"01001001","b":"1","c":"1","v":"1","t":"Some \"Text\""},{"id":"01001002","b":"1","c":"1","v":"2","t":"More Text"},{"id":"01001003","b":"1","c":"1","v":"3","t":"And Even More"}]

I have tested the following regex as working at phpliveregex.com:

\[(\{"id":"[0-9]{8}","b":"[0-9]{1,2}","c":"[0-9]{1,2}","v":"[0-9]{1,3}","t":"[^"\\]*(?:\\.[^"\\]*)*"\})(,\{"id":"[0-9]{8}","b":"[0-9]{1,2}","c":"[0-9]{1,2}","v":"[0-9]{1,3}","t":"[^"\\]*(?:\\.[^"\\]*)*"\})*\]

Here is how I put it together in PHP:

$sv = '01001001';
$ev = '01001003';
$url = 'http://api.amasterdesigns.com/?sv='.$sv.'&ev='.$ev;
$JSON = file_get_contents($url);
//return JSON only if properly formatted
if(preg_match('/\[(\{"id":"[0-9]{8}","b":"[0-9]{1,2}","c":"[0-9]{1,2}","v":"[0-9]{1,3}","t":"[^"\\]*(?:\\.[^"\\]*)*"\})(,\{"id":"[0-9]{8}","b":"[0-9]{1,2}","c":"[0-9]{1,2}","v":"[0-9]{1,3}","t":"[^"\\]*(?:\\.[^"\\]*)*"\})*\]/',$JSON)){
    return json_decode($JSON);
} else {
    return;
}

The problem I am receiving is when I run this page I receive this error

Warning: preg_match(): Compilation failed: missing terminating ] for character class at offset 202 in path_to_file/my-file.php on line 1422

Line 1422 is line 6 of the above code snippet. I believe this is pointing to [^"\\] near the end of my regex, but I do have a terminating ] following an escaped \.

You can see the errors using PHP sandbox

amaster
  • 1,915
  • 5
  • 25
  • 51
  • Why are you trying to parse JSON with a regexp instead of using `json_decode()`? – Barmar May 25 '17 at 11:30
  • I want to ensure that the data is what I expect before it is decoded. – amaster May 25 '17 at 11:31
  • 1
    It will be much easier to decode the JSON and then validate the contents. – Barmar May 25 '17 at 11:33
  • More particularly, I am using this code in a WordPress plugin and wanted to add another layer of validation before outputting errors on the public side in case there is a bad response from the API where the JSON is retrieved. – amaster May 25 '17 at 11:33
  • 1
    It would still be much easier to parse the JSON, check `json_last_error`, and then traverse the parsed object/array to validate the data than cramming it all into one regex. – deceze May 25 '17 at 11:41
  • Starting with the fact that the order of keys in a JSON object is immaterial, so expecting a certain order for validation makes your code extremely fragile. – deceze May 25 '17 at 11:45
  • @deceze I am the one also controlling the API, so I know how the data will be sent over, I am just adding a step to make sure that it was not changed along the way. Maybe a little overboard here, I am also validating the data after it is decoded in another place in the script. – amaster May 25 '17 at 11:49
  • Unless you're cobbling together the JSON by hand (don't do that), but you're actually using `json_encode`, you have very little guarantee what order the keys in that object will be output in. `{"foo": "bar", "baz": 42}` and `{"baz": 42, "foo": "bar"}` are absolutely equivalent as far as JSON is concerned, and you have virtually no guarantee how exactly the result of a `json_encode` will look; even less in other languages. That's why JSON validation through regex is extremely likely to fail soon for perfectly valid code. – deceze May 25 '17 at 12:05
  • @wiktor I reviewed the possible duplicate answer before posting my question. The answer did not clearly define what the problem was as the accepted answer to this question did. – amaster May 25 '17 at 12:44
  • Another link added to the close reason. – Wiktor Stribiżew May 25 '17 at 12:46
  • @WiktorStribiżew I did not see that question because I did not know that escaping the backslash was the problem. I assumed that I had already escaped the backslash correctly. At least now I know, so be it if it remains as a duplicate even though I think that the wording would make this unique and helpful to others in search of similar problem. – amaster May 25 '17 at 12:53
  • Yes, it will help. It will redirect to those posts. Barmar did not explain much anyway. The essence is: a regex engine expects a literal backslash as an escaping symbol. The rest of the explanation is available at [PHP string literals](http://php.net/manual/en/language.types.string.php) where we see that even in single-quoted PHP string literals, you need to use ``\\`` to define a literal backslash. – Wiktor Stribiżew May 25 '17 at 12:56

3 Answers3

3

This part:

[^"\\]

needs to be:

[^"\\\\]

You need to double the backslashes again because they act as escapes in both string and regular expression syntax. \\ turns into \ when it gets sent to preg_match, and that's escaping the ] instead of treating backslash as one of the characters in the character set.

Barmar
  • 741,623
  • 53
  • 500
  • 612
0

The first check on v is missing an opening [:

"v":"0-9]{1,3}"  // should be:
"v":"[0-9]{1,3}"
rickdenhaan
  • 10,857
  • 28
  • 37
  • I'm sorry some how I error in copying/pasting and formatting my code in my question. This beginning `[` is in place. Please see edit. – amaster May 25 '17 at 11:39
  • There is no *must* here. Closing bracket alone doesn't mean a special character in a regex. – revo May 25 '17 at 11:40
  • @revo no, but if you're trying to match 1-3 characters in the range of 0-9, PECL will need one there to work as you want. Otherwise it would look for the literal string "0-9" followed by 1-3 "]" characters. – rickdenhaan May 25 '17 at 11:47
  • `PECL` or `PCRE`? – revo May 25 '17 at 11:54
  • @revo got my acronyms mixed up. You're right, I meant PCRE. Sorry! – rickdenhaan May 25 '17 at 11:55
  • Yet provided error doesn't have anything to do with missing opening bracket. – revo May 25 '17 at 11:55
  • Yes, that's why @Barmar's answer is the accepted one. I'm not deleting this answer, because it did lead to an edit in the original question. Even though this was not the answer to the original problem. – rickdenhaan May 25 '17 at 11:58
  • ... and I didn't try to make you do so. – revo May 25 '17 at 12:04
0

Adding to @Barmar's answer, or wrap your regex in a nowdoc format:

$re = <<< 'RE'
/\[(\{"id":"[0-9]{8}","b":"[0-9]{1,2}","c":"[0-9]{1,2}","v":"[0-9]{1,3}","t":"[^"\\]*(?:\\.[^"\\]*)*"\})(,\{"id":"[0-9]{8}","b":"[0-9]{1,2}","c":"[0-9]{1,2}","v":"[0-9]{1,3}","t":"[^"\\]*(?:\\.[^"\\]*)*"\})*\]/
RE;

if (preg_match($re, $JSON)) {
    print_r(json_decode($JSON));
}
revo
  • 47,783
  • 14
  • 74
  • 117