18

I am building a JSON validator from scratch, but I am quite stuck with the string part. My hope was building a regex which would match the following sequence found on JSON.org:

JSON.org String Sequence

My regex so far is:

/^\"((?=\\)\\(\"|\/|\\|b|f|n|r|t|u[0-9a-f]{4}))*\"$/

It does match the criteria with a backslash following by a character and an empty string. But I'm not sure how to use the UNICODE part.

Is there a regex to match any UNICODE character expert " or \ or control character? And will it match a newline or horizontal tab?

The last question is because the regex match the string "\t", but not " " (four spaces, but the idea is to be a tab). Otherwise I will need to expand the regex with it, which is not a problem, but my guess is the horizontal tab is a UNICODE character.

Thanks to Jaeger Kor, I now have the following regex:

/^\"((?=\\)\\(\"|\/|\\|b|f|n|r|t|u[0-9a-f]{4})|[^\\"]*)*\"$/

It appears to be correct, but is there any way to check for control characters or is this unneeded as they appear on the non-printable characters on regular-expressions.info? The input to validate is always text from a textarea.

Update: the regex is as following in case anyone needs it:

/^("(((?=\\)\\(["\\\/bfnrt]|u[0-9a-fA-F]{4}))|[^"\\\0-\x1F\x7F]+)*")$/
Sietse
  • 623
  • 1
  • 8
  • 23
  • The above regular expression suffers from inefficiency and ambiguity which can lead to malicious user performing a Denial of Service ("DoS") attack. Here is version that is free of the inefficiency: `/^("(((?=\\)\\(["\\\/bfnrt]|u[0-9a-fA-F]{4}))|[^"\\\x00-\x1F\x7F])*")$/` – Vladimír Gorej Jan 25 '23 at 13:10

2 Answers2

16

For your exact question create a character class

# Matches any character that isn't a \ or "
/[^\\"]/

And then you can just add * on the end to get 0 or unlimited number of them or alternatively 1 or an unlimited number with +

/[^\\"]*/

or

/[^\\"]+/

Also there is this below, found at https://regex101.com/ under the library tab when searching for json

/(?(DEFINE)
# Note that everything is atomic, JSON does not need backtracking if it's valid
# and this prevents catastrophic backtracking
(?<json>(?>\s*(?&object)\s*|\s*(?&array)\s*))
(?<object>(?>\{\s*(?>(?&pair)(?>\s*,\s*(?&pair))*)?\s*\}))
(?<pair>(?>(?&STRING)\s*:\s*(?&value)))
(?<array>(?>\[\s*(?>(?&value)(?>\s*,\s*(?&value))*)?\s*\]))
(?<value>(?>true|false|null|(?&STRING)|(?&NUMBER)|(?&object)|(?&array)))
(?<STRING>(?>"(?>\\(?>["\\\/bfnrt]|u[a-fA-F0-9]{4})|[^"\\\0-\x1F\x7F]+)*"))
(?<NUMBER>(?>-?(?>0|[1-9][0-9]*)(?>\.[0-9]+)?(?>[eE][+-]?[0-9]+)?))
)
\A(?&json)\z/x

This should match any valid json, you can also test it at the website above

EDIT:

Link to the regex

InSync
  • 4,851
  • 4
  • 8
  • 30
  • Thanks for your quick response. I added it to my first regular expression and it seems to be working fine. I don't know anything about the control characters, but maybe I don't need to worry about it as the input is from a textarea where they might not be accepted in. The last regex you provided was a complete regex, but I want to know where the error is. But than again, I'll check it if it might be more useful! – Sietse Aug 22 '15 at 12:03
  • I have been playing with your latest regex, and when splitting them, they work great! Thanks! – Sietse Aug 22 '15 at 13:27
  • 2
    Thx ! The search url is: https://regex101.com/library?orderBy=MOST_POINTS&search=json – vbrajon Oct 09 '16 at 08:42
  • 1
    if you could post a code snippet that uses this regex it would be helpful, I got a lot of syntax errors when I pasted that into my code – jcollum Mar 31 '21 at 16:56
2

Use this, works also with array jsons [{...},{...}]:

((\[[^\}]{3,})?\{s*[^\}\{]{3,}?:.*\}([^\{]+\])?)

Demo: https://regex101.com/r/aHAnJL/1

dazzafact
  • 2,570
  • 3
  • 30
  • 49