114

I am looking for a Regex that allows me to validate json.

I am very new to Regex's and i know enough that parsing with Regex is bad but can it be used to validate?

Shard
  • 3,175
  • 6
  • 30
  • 40
  • 32
    Why bother with a separate validation step? Most languages have JSON-libraries that can parse JSON, and if it can parse it, it was valid. If not, the library will tell you. – Epcylon Oct 02 '10 at 13:18
  • You need to parse text in order to validate it... – Ken Jan 02 '11 at 06:01
  • @mario - What's the point of the bounty here? Are you looking for more answers, or just some attention to your cause? `:)` – Kobi Jun 04 '11 at 08:29
  • @Kobi: It's primarily normal bounty attention whoring :> I hope to outcompete the invalid accepted answer at least. Also less nefarious: getting some community review without needing a separate question. And maybe someone can simplify it further, or convert it into a compacter `(?R)` version. – mario Jun 04 '11 at 10:51
  • 3
    @mario - I don't know... I'm all for abusing regex, and extremely sympathetic to your objection to the "regex must match regular" fallacy - but not on practical, work related questions. The best answer here is really Epcylon's comment... (maybe this discussion belongs in the chat?) – Kobi Jun 04 '11 at 13:14
  • @Kobi. Well, my answer is just a by-product of a benchmarking craze (lost my bet). And in this question context it's more of a can-it-be-done? topic. I have one actual use case nevertheless. I'm going to prepend the verification on PHPs `json_decode`, which despite the simplicity of JSON had around a dozen exploitabilities. Old PHP versions are still awfully widespread, so I'm using it as security addon. – mario Jun 04 '11 at 13:46
  • 3
    Another practical use case is *finding* JSON expressions within a larger string. If you simply want to ask "is this string here a JSON object", then yes, a JSON parsing library is probably a better tool. But it can't find JSON objects within a larger structure for you. – Mark Amery Dec 23 '14 at 17:32
  • @Epcylon that is sadly not true - because most json parser parse strings and eliminate duplicated nodes, which makes it a valid json, but doesnt tell you if it was in the first place – Dominik Lemberger Mar 23 '17 at 08:55
  • 1
    This isn't an answer, but you can use [this part of Crockford's JSON-js library](https://github.com/douglascrockford/JSON-js/blob/2a76286e00cdc1e98fbc9e9ec6589563a3a4c3bb/json2.js#L488). It uses 4 regexes and combines them in a clever way. – imgx64 Oct 10 '19 at 07:03
  • It does not match `"\/"` as a valid json string but it is a valid json string value. can you fix this?. for example an escaped url such as `"https:\/\/websit.com"` will not be matched by your string group. – Eboubaker Feb 04 '22 at 11:23

12 Answers12

208

Yes, a complete regex validation is possible.

Most modern regex implementations allow for recursive regexpressions, which can verify a complete JSON serialized structure. The json.org specification makes it quite straightforward.

$pcre_regex = '
  /
  (?(DEFINE)
     (?<number>   -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )    
     (?<boolean>   true | false | null )
     (?<string>    " ([^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
     (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
     (?<pair>      \s* (?&string) \s* : (?&json)  )
     (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
     (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
  )
  \A (?&json) \Z
  /six   
';

It works quite well in PHP with the PCRE functions . Should work unmodified in Perl; and can certainly be adapted for other languages. Also it succeeds with the JSON test cases.

Simpler RFC4627 verification

A simpler approach is the minimal consistency check as specified in RFC4627, section 6. It's however just intended as security test and basic non-validity precaution:

  var my_JSON_object = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test(
         text.replace(/"(\\.|[^"\\])*"/g, ''))) &&
     eval('(' + text + ')');
rpatel
  • 576
  • 1
  • 6
  • 20
mario
  • 144,265
  • 20
  • 237
  • 291
  • 28
    +1 There is so much bad in the world from people who just don't get the regex syntax and misuse that as a reason to hate them :( – NikiC Jun 05 '11 at 15:43
  • 9
    @mario, not sure if you think I am in the _the-naysayers-department_, but I'm not. Note that your statement _"Most modern regex implementations allow for recursive regexpressions"_ is highly debatable. AFAIK, only Perl, PHP and .NET have the capability to define recursive patterns. I wouldn't call that "most". – Bart Kiers Jun 06 '11 at 20:49
  • 3
    @Bart: Yes, that's rightly debatable. Most ironically the Javascript regex engines cannot use such a recursive regex to verify JSON (or only with elaborate workarounds). So if regex == posix regex, it's not an option. It's nevertheless interesting that it's doable with the contemporary implementations; even with few practical use cases. (But true, libpcre is not the prevalent engine everywhere.) -- Also for the record: I was hoping for a synthetic reversal badge, but your not getting a few bandwagon upvotes impedes that. :/ – mario Jun 06 '11 at 21:02
  • 1
    Java, Python, JavaScript, Ruby all do not support recursive patterns, to name a few popular languages. So your _"Most modern regex implementations"_ isn't just debatable, it's wrong. And mimicking a fixed number of nesting with look-arounds isn't really recursive, if that's what you meant by _"elaborate workarounds"_. But now I get it, by attaching a bounty you're hoping my answer gets enough down-votes and yours enough up-votes just for a badge? I'm sorry to say, I pity you. I recommend you down-vote my answer as well in order to get your precious badge (if you haven't done so already). – Bart Kiers Jun 06 '11 at 21:13
  • 4
    Nope. I was after the Populist badge, for which I require 20 votes but still 10 votes on your answer. So on the contrary the downvotes on your question are not to my benefit for that. – mario Jun 07 '11 at 14:21
  • Using `\d` is dangerous. In many regexp implementations `\d` matches the Unicode definition of a digit that is not just `[0-9]` but instead includes alternates scripts. – dolmen Jan 10 '13 at 08:51
  • 2
    Well, looking further, this regexp has many other issues. It matches JSON data, but some non-JSON data matches too. For example, the single literal `false` matches while the top level JSON value must be either an array or an object. It has also many issues in character set allowed in strings or in spaces. – dolmen Jan 10 '13 at 11:03
  • @dolmen: True. The JSON RFC makes only array and objects explicit for the outer shell. I was looking at this from a PHP `json_decode` standpoint, where the three literal tokens, strings or numbers are also accepted. And obviously I did not care about the string validity; that would require at least the `/u` flag and some further constraints in `[^"\\\\]*`. As for `\d` that depends on the locale and PCRE version obviously. – mario Jan 10 '13 at 18:29
  • Related for the thematic, also mostly theoretical but regex feature comparison value: [JSON parser as a single Perl Regex](http://www.perlmonks.org/?node_id=995856) demonstrates how Perls regex code callbacks `(?{..}?)` can build an actual JSON parse tree, not just validate it. – mario Oct 19 '13 at 00:38
  • 1
    Is there a `C#` version of this? – Soham Dasgupta Jan 12 '16 at 11:44
  • 1
    This regex actually does not pass 3 test cases from test suite with invalid files from http://www.json.org/JSON_checker/. (fail1.json, fail25.json, fail27.json). Originally fail18.json was not passed too, but there where an error there. – Gino Pane Jul 25 '16 at 09:48
  • @GinoPane That's what →dolmen already noted. This regex was modeled after PHPs implementation - which accepts atoms like `true` and `false` or a `"plain string"` instead of an object/array as outer shell. Moreover it's a bit more JSOL than JSON, as it allows unescaped linebreaks/tabs. – mario Jul 25 '16 at 12:59
  • @mario, not exactly by now , cause according to RFC-7159 it would be valid JSON strings. Real problem was only with `fail25.json`, `fail27.json`, but I've fixed them. – Gino Pane Jul 25 '16 at 13:25
  • The Regex also works for json with duplicated nodes on the same level - which in json is wrong there can not be 2 "Head" Nodes on Top Level for example – Dominik Lemberger Mar 23 '17 at 08:57
  • The suggested regex fails when the JSON includes escape sequences, e.g. `{"libelle":"Cin\u00e9ma Gaumont Amiens"}`. https://regex101.com/r/kkMbN4/1 – Gajus Jul 09 '18 at 11:32
  • 1
    @Gajus: It fails because you copied the literal 4 backslashes in `\\\\ u [0-9a-f]+` over. For regex-only context, it's just 2 backslashes however. – mario Jul 09 '18 at 13:45
  • To use in PHP, add `trim()` to the pattern or it will be error unknow modifier... `preg_match(trim($pcre_regex), 'json string here');`. – vee Dec 04 '20 at 14:51
  • this doesn't seem reliable to me: https://3v4l.org/DpiAd – hanshenrik Mar 11 '22 at 22:03
  • `["FABRICATION",[],` This input will cause `catastrophic backtracking` error. snippt:https://regex101.com/r/Jj0bRX/1 There is a problem with the array part – Eboubaker Mar 15 '22 at 16:13
  • @DominikLemberger Duplicated property names are perfectly legal in JSON. From the [spec](https://www.ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf): "The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange." – Daniel Schilling May 25 '23 at 14:17
36

Yes, it's a common misconception that Regular Expressions can match only regular languages. In fact, the PCRE functions can match much more than regular languages, they can match even some non-context-free languages! Wikipedia's article on RegExps has a special section about it.

JSON can be recognized using PCRE in several ways! @mario showed one great solution using named subpatterns and back-references. Then he noted that there should be a solution using recursive patterns (?R). Here is an example of such regexp written in PHP:

$regexString = '"([^"\\\\]*|\\\\["\\\\bfnrt\/]|\\\\u[0-9a-f]{4})*"';
$regexNumber = '-?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?';
$regexBoolean= 'true|false|null'; // these are actually copied from Mario's answer
$regex = '/\A('.$regexString.'|'.$regexNumber.'|'.$regexBoolean.'|';    //string, number, boolean
$regex.= '\[(?:(?1)(?:,(?1))*)?\s*\]|'; //arrays
$regex.= '\{(?:\s*'.$regexString.'\s*:(?1)(?:,\s*'.$regexString.'\s*:(?1))*)?\s*\}';    //objects
$regex.= ')\Z/is';

I'm using (?1) instead of (?R) because the latter references the entire pattern, but we have \A and \Z sequences that should not be used inside subpatterns. (?1) references to the regexp marked by the outermost parentheses (this is why the outermost ( ) does not start with ?:). So, the RegExp becomes 268 characters long :)

/\A("([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"|-?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?|true|false|null|\[(?:(?1)(?:,(?1))*)?\s*\]|\{(?:\s*"([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"\s*:(?1)(?:,\s*"([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"\s*:(?1))*)?\s*\})\Z/is

Anyway, this should be treated as a "technology demonstration", not as a practical solution. In PHP I'll validate the JSON string with calling the json_decode() function (just like @Epcylon noted). If I'm going to use that JSON (if it's validated), then this is the best method.

Hrant Khachatrian
  • 3,079
  • 24
  • 30
  • 1
    Using `\d` is dangerous. In many regexp implementations `\d` matches the Unicode definition of a digit that is not just `[0-9]` but instead includes alternates scripts. – dolmen Jan 10 '13 at 08:50
  • @dolmen: you may be right, but you shouldn't edit that yourself into the question. Just adding it as a comment should suffice. – Dennis Haarbrink Jan 10 '13 at 09:02
  • I think `\d` does not match unicode numbers in PHP's implementation of PCRE. For example `٩` symbol (0x669 arabic-indic digit nine) will be matched using pattern `#\p{Nd}#u` but not `#\d#u` – Hrant Khachatrian Jan 10 '13 at 10:02
  • @hrant-khachatrian: it does not because you did not use the `/u` flag. JSON is encoded in UTF-8. For a proper regexp you should use that flag. – dolmen Jan 10 '13 at 14:13
  • Besides that, as this implementation is based on @mario's, it repeats the same flaws: at the top level only arrays and object are allowed. Not string, number, boolean or null. Fixing this requires a major refactoring. – dolmen Jan 10 '13 at 14:14
  • 1
    @dolmen I did use the `u` modifier, please look again at the patterns in my previous comment :) Strings, numbers and booleans ARE correctly matched at the top level. You can paste the long regexp here http://www.quanetic.com/Regex and try yourself – Hrant Khachatrian Jan 12 '13 at 13:46
16

Because of the recursive nature of JSON (nested {...}-s), regex is not suited to validate it. Sure, some regex flavours can recursively match patterns* (and can therefor match JSON), but the resulting patterns are horrible to look at, and should never ever be used in production code IMO!

* Beware though, many regex implementations do not support recursive patterns. Of the popular programming languages, these support recursive patterns: Perl, .NET, PHP and Ruby 1.9.2

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • 4
    [Humorously relevant related question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags)... – Darien May 31 '11 at 22:59
  • 18
    @all down voters: _"regex is not suited to validate it"_ does not mean certain regex engines can't do it (at least, that is what I meant). Sure, some regex implementations _can_, but anyone in their right mind would simply use a JSON parser. Just like if someone asks how to build a complete house with only a hammer, I'd answer that a hammer isn't suited for the job, you'd need a complete toolkit and machinery. Sure, someone with enough endurance can do it with just the hammer. – Bart Kiers Jun 06 '11 at 20:56
  • 3
    This may be a valid warning, but it _does not answer the question_. Regex may not be the correct tool, but some people don't have a choice. We're locked into a vendor product that evaluates the output of a service to check its health, and the only option the vendor provides for custom health checking is a web form that accepts a regex. The vendor product that evaluates the service status is not under my team's control. For us, evaluating JSON with regex is now a requirement, therefore, an answer of "unsuitable" is not viable. (I still didn't downvote you.) – John Deters Jan 28 '19 at 21:34
14

Looking at the documentation for JSON, it seems that the regex can simply be three parts if the goal is just to check for fitness:

  • [First] The string starts and ends with either [] or {}

    • [{\[]{1}...[}\]]{1}
  • AND EITHER

    • [Second] The character is an allowed JSON control character (just one)

      • ...[,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]...
    • [Third] The set of characters contained in a ""

      • ...".*?"...

All together: [{\[]{1}([,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]|".*?")+[}\]]{1}

If the JSON string contains newline characters, then you should use the singleline switch on your regex flavor so that . matches newline. Please note that this will not fail on all bad JSON, but it will fail if the basic JSON structure is invalid, which is a straight-forward way to do a basic sanity validation before passing it to a parser.

Kenn Sebesta
  • 7,485
  • 1
  • 19
  • 21
cjbarth
  • 4,189
  • 6
  • 43
  • 62
  • 2
    The suggested regex has awful backtracking behavior on certain testcases. If you try running it on '{"a":false, "b":true,"c":100,"' this incomplete json, it halts. Example: https://regex101.com/r/Zzc6sz. A simple fix would be: [{\[]{1}([,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]|".*?")+[}\]]{1} – Toonijn Aug 02 '17 at 07:56
  • @Toonijn I've updated to reflect your comment. Thanks! – cjbarth Aug 03 '17 at 19:21
  • 2
    This slightly modified version of @cjbarth works perfect for my use case of finding all JSON like structures in text (globally applied to a HTML file in my case): `[{\[]{1}([,:{}\[\]0-9.\-+A-zr-u \n\r\t]|".*:?")+[}\]]{1}` – C2BB Dec 13 '20 at 17:30
  • In my environment, and at regexr, this is matching against ```{{"parentRelationField": "Project_Name__c", "employeeIdField": "Employee_Name__c"}``` - did you find a way to prevent it matching when the open and close braces are not matching in count? – Shanerk Mar 25 '22 at 21:03
  • @ShaneK, for something like that, you're better off with one of the other more complex solutions or using a simple function to count `{}`. – cjbarth Apr 01 '22 at 15:10
13

I tried @mario's answer, but it didn't work for me, because I've downloaded test suite from JSON.org (archive) and there were 4 failed tests (fail1.json, fail18.json, fail25.json, fail27.json).

I've investigated the errors and found out, that fail1.json is actually correct (according to manual's note and RFC-7159 valid string is also a valid JSON). File fail18.json was not the case either, cause it contains actually correct deeply-nested JSON:

[[[[[[[[[[[[[[[[[[[["Too deep"]]]]]]]]]]]]]]]]]]]]

So two files left: fail25.json and fail27.json:

["  tab character   in  string  "]

and

["line
break"]

Both contains invalid characters. So I've updated the pattern like this (string subpattern updated):

$pcreRegex = '/
          (?(DEFINE)
             (?<number>   -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )
             (?<boolean>   true | false | null )
             (?<string>    " ([^"\n\r\t\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
             (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
             (?<pair>      \s* (?&string) \s* : (?&json)  )
             (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
             (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
          )
          \A (?&json) \Z
          /six';

So now all legal tests from json.org can be passed.

Gino Pane
  • 4,740
  • 5
  • 31
  • 46
  • 1
    This will match just JSON values(strings, booleans, and numbers) as well, which is not a JSON object/array. – kowsikbabu Feb 07 '20 at 14:21
  • 1
    It does not match `"\/"` as a valid json string but it is a valid json string value. can you fix this?. for example an escaped url such as `"https:\/\/websit.com"` will not be matched by your string group. – Eboubaker Feb 04 '22 at 11:23
3

I created a Ruby implementation of Mario's solution, which does work:

# encoding: utf-8

module Constants
  JSON_VALIDATOR_RE = /(
         # define subtypes and build up the json syntax, BNF-grammar-style
         # The {0} is a hack to simply define them as named groups here but not match on them yet
         # I added some atomic grouping to prevent catastrophic backtracking on invalid inputs
         (?<number>  -?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?){0}
         (?<boolean> true | false | null ){0}
         (?<string>  " (?>[^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " ){0}
         (?<array>   \[ (?> \g<json> (?: , \g<json> )* )? \s* \] ){0}
         (?<pair>    \s* \g<string> \s* : \g<json> ){0}
         (?<object>  \{ (?> \g<pair> (?: , \g<pair> )* )? \s* \} ){0}
         (?<json>    \s* (?> \g<number> | \g<boolean> | \g<string> | \g<array> | \g<object> ) \s* ){0}
       )
    \A \g<json> \Z
    /uix
end

########## inline test running
if __FILE__==$PROGRAM_NAME

  # support
  class String
    def unindent
      gsub(/^#{scan(/^(?!\n)\s*/).min_by{|l|l.length}}/u, "")
    end
  end

  require 'test/unit' unless defined? Test::Unit
  class JsonValidationTest < Test::Unit::TestCase
    include Constants

    def setup

    end

    def test_json_validator_simple_string
      assert_not_nil %s[ {"somedata": 5 }].match(JSON_VALIDATOR_RE)
    end

    def test_json_validator_deep_string
      long_json = <<-JSON.unindent
      {
          "glossary": {
              "title": "example glossary",
          "GlossDiv": {
                  "id": 1918723,
                  "boolean": true,
                  "title": "S",
            "GlossList": {
                      "GlossEntry": {
                          "ID": "SGML",
                "SortAs": "SGML",
                "GlossTerm": "Standard Generalized Markup Language",
                "Acronym": "SGML",
                "Abbrev": "ISO 8879:1986",
                "GlossDef": {
                              "para": "A meta-markup language, used to create markup languages such as DocBook.",
                  "GlossSeeAlso": ["GML", "XML"]
                          },
                "GlossSee": "markup"
                      }
                  }
              }
          }
      }
      JSON

      assert_not_nil long_json.match(JSON_VALIDATOR_RE)
    end

  end
end
Gajus
  • 69,002
  • 70
  • 275
  • 438
pmarreck
  • 179
  • 1
  • 9
  • Using \d is dangerous. In many regexp implementations \d matches the Unicode definition of a digit that is not just [0-9] but instead includes alternates scripts. So unless Unicode support in Ruby is still broken, you have to fix the regexp in your code. – dolmen Jan 10 '13 at 09:04
  • As far as I know, Ruby uses PCRE in which \d does not match ALL unicode definitions of "digit." Or are you saying that it should? – pmarreck Feb 06 '15 at 20:26
  • Except that it does not. False positive: "\x00", [True]. False negative: "\u0000", "\n". Hangs on: "[{"":[{"":[{"":" (repeated 1000x). – nst Aug 29 '16 at 21:02
  • Not too hard to add as test cases and then tweak the code to pass. How to get it not to blow the stack with a depth of 1000+ is an entirely different matter, though... – pmarreck Jun 08 '17 at 21:36
1

A trailing comma in a JSON array caused my Perl 5.16 to hang, possibly because it kept backtracking. I had to add a backtrack-terminating directive:

(?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) )(*PRUNE) \s* )
                                                                                   ^^^^^^^^

This way, once it identifies a construct that is not 'optional' (* or ?), it shouldn't try backtracking over it to try to identify it as something else.

user117529
  • 663
  • 8
  • 16
1

For "strings and numbers", I think that the partial regular expression for numbers:

-?(?:0|[1-9]\d*)(?:\.\d+)(?:[eE][+-]\d+)?

should be instead:

-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?

since the decimal part of the number is optional, and also it is probably safer to escape the - symbol in [+-] since it has a special meaning between brackets

Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
Mikaeru
  • 11
  • 1
  • Using `\d` is dangerous. In many regexp implementations `\d` matches the Unicode definition of a digit that is not just `[0-9]` but instead includes alternates scripts. – dolmen Jan 10 '13 at 09:02
  • It looks a bit strange, that -0 is a valid number but RFC 4627 allows it and your regular expression conforms to it. – ceving May 03 '13 at 11:28
1

Regex that validate simple JSON not JSONArray

it validate key(string):value(string,integer,[{key:value},{key:value}],{key:value})

^\{(\s|\n\s)*(("\w*"):(\s)*("\w*"|\d*|(\{(\s|\n\s)*(("\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))((,(\s|\n\s)*"\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))*(\s|\n\s)*\}){1}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d*|(\{(\s|\n\s)*(("\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))((,(\s|\n\s)*"\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))*(\s|\n\s)*\}){1}))*(\s|\n)*\}$

sample data that validate by this JSON

{
"key":"string",
"key": 56,
"key":{
        "attr":"integer",
        "attr": 12
        },
"key":{
        "key":[
            {
                "attr": 4,
                "attr": "string"
            }
        ]
     }
}
0

As was written above, if the language you use has a JSON-library coming with it, use it to try decoding the string and catch the exception/error if it fails! If the language does not (just had such a case with FreeMarker) the following regex could at least provide some very basic validation (it's written for PHP/PCRE to be testable/usable for more users). It's not as foolproof as the accepted solution, but also not that scary =):

~^\{\s*\".*\}$|^\[\n?\{\s*\".*\}\n?\]$~s

short explanation:

// we have two possibilities in case the string is JSON
// 1. the string passed is "just" a JSON object, e.g. {"item": [], "anotheritem": "content"}
// this can be matched by the following regex which makes sure there is at least a {" at the
// beginning of the string and a } at the end of the string, whatever is inbetween is not checked!

^\{\s*\".*\}$

// OR (character "|" in the regex pattern)
// 2. the string passed is a JSON array, e.g. [{"item": "value"}, {"item": "value"}]
// which would be matched by the second part of the pattern above

^\[\n?\{\s*\".*\}\n?\]$

// the s modifier is used to make "." also match newline characters (can happen in prettyfied JSON)

if I missed something that would break this unintentionally, I'm grateful for comments!

exside
  • 3,736
  • 1
  • 12
  • 19
-2

Here my regexp for validate string:

^\"([^\"\\]*|\\(["\\\/bfnrt]{1}|u[a-f0-9]{4}))*\"$

Was written usign original syntax diagramm.

Sergey Kamardin
  • 1,640
  • 1
  • 18
  • 22
-3

I realize that this is from over 6 years ago. However, I think there is a solution that nobody here has mentioned that is way easier than regexing

function isAJSON(string) {
    try {
        JSON.parse(string)  
    } catch(e) {
        if(e instanceof SyntaxError) return false;
    };  
    return true;
}
Jamie
  • 90
  • 1
  • 3