0

I have the following JSON string :

{"FirstName":"John","LastName":"Smith"}

When I apply the following regex, it correctly returns the key-value pair groups:

(?<keyValuePair>(?<key>"\w+"):(?<value>".*?[^\\]"+?))+?

I get the matches:

1. "FirstName":"John"
    1.1 key:"FirstName"
    1.2 value:"John"
2. "LastName":"Smith"
    2.1 key:"LastName"
    2.2 value:"Smith"

Now, I want to have a group for object, i.e. find all objects.. On the same JSON string, I apply the following regex

(?<object>{(?<properties>.*?)})

I get the matches:

1. {"FirstName":"John","LastName":"Smith"}
    1.1 object : {"FirstName":"John","LastName":"Smith"}
    1.2 properties : "FirstName":"John","LastName":"Smith"

What I want is the get the goups of the first regex as sub-groups of properties in the second regex.

So the expected result should be:

1. {"FirstName":"John","LastName":"Smith"}
    1.1 object : {"FirstName":"John","LastName":"Smith"}
    1.2 properties : "FirstName":"John","LastName":"Smith"
        1.2.1 "FirstName":"John"
            1.2.1.1 key : "FirstName"
            1.2.1.2 value : "John"
        1.2.2 "LastName":"Smith"
            1.2.2.1 key : "LastName"
            1.2.2.2 value : "Smith"

Could someone help me to create a regex to get the result as above.

This would not count as a duplicate

I have so far tried many things since the past 3 hours and my mind is spinning.

Christoph Fink
  • 22,727
  • 9
  • 68
  • 113
Chris Serrao
  • 135
  • 2
  • 12
  • 1
    Is there a reason you are using RegEx and not e.g. JSON.net? – Christoph Fink Jul 25 '14 at 06:26
  • 2
    Parsing JSON via Regex is not good idea I think. Why don't just use `Newtonsoft JSON` and its `JObject`? –  Jul 25 '14 at 06:26
  • I have an application that gets data from the database and process it in .Net. The query is provided by the user at runtime, so I don't know the schema. They created a custom parser that took around 4.5 secs to process that json. Newtonsoft took 6 secs. I want to bring down the time to as minimum as possible. I want to see if I can achieve it using Regex. – Chris Serrao Jul 25 '14 at 06:29
  • I assume you mean 6 seconds to parse a large chunk of JSON. The above sample is tiny. – codenheim Jul 25 '14 at 06:50
  • the json is very large. the network service that returns the data has an object with 197 string properties for the query I'm executing and a total of 6000 objects – Chris Serrao Jul 25 '14 at 07:12

1 Answers1

2

I have so far tried many things since the past 3 hours and my mind is spinning.

Not to be snide, not at all, but in 3 hours you could have written a recursive descent parser for JSON, or in about 30 minutes you could have installed JSON.NET, read the docs/samples and moved on to other things. Why not try that now? There is no future in parsing JSON with regex, because JSON is a context free language, which is recursive and potentially infinitely long and nested. Regex is DFA/NFA. It can't handle the CFG. Sort of like Parsing HTML with Regex (ok I couldn't resist)

Unless you have a very limited type of JSON and absolutely are against adding the 3rd party library, I wouldn't bother. Chalk it up to learning experience.

Community
  • 1
  • 1
codenheim
  • 20,467
  • 1
  • 59
  • 80
  • I have tried Newtonsoft and we do have a custom JSON parser which performs better than Newtonsoft, however as I have mentioned both are consuming 6 and 4 secs respectively and hence I am looking for faster options and have turned to regex – Chris Serrao Jul 25 '14 at 06:33
  • Hmm.. the question is: will the regex be faster that well-written custom parser? I have a doubt. JSON looks simple, but recursion could be deadly. –  Jul 25 '14 at 06:35
  • Is your potential json input very limited? As in - can you reasonably describe the possible inputs in one page? Even so, I'd still consider hand-written recursive descent parser. It could be the part that is slow is the lookup / mapping to the object properties, not the actual parsing. – codenheim Jul 25 '14 at 06:35
  • The JSON will simply return and array of objects with simple string properties and no nesting. – Chris Serrao Jul 25 '14 at 06:37
  • Hmm.. you have written that "so I don't know the schema". It looks like you know the schema, so I think that custom, well written parser will be more efficient that regex. (I think it is enought to parse JSON char-by-char in a single phase). Regex won't do it faster. –  Jul 25 '14 at 06:41
  • I'm puzzled by the 6 second claim for the above JSON. I would expect 50-100 ms. Sounds like something is wrong, or the JSON you are referring to is large. – codenheim Jul 25 '14 at 06:52
  • the json is very large. the network service that returns the data has an object with 197 string properties for the query I'm executing and a total of 6000 objects – Chris Serrao Jul 25 '14 at 07:12
  • I think that is a perfect reason not to try to parse with regex.How do you consume the data? Do you only need a few fields? Or do you use the whole object. You may consider a technique I used on large XML documents. I once wrote an XML scanner that didn't actually deserialize the XML, but queried it with XPath style queries for only small fragments. It performed better than deserializing. JSON is similar in structure to XML, though not in syntax. I'd also be interested to see how a C++ JSON parser would handle it. – codenheim Jul 25 '14 at 07:24