17

I'm trying to see if it's possible to lookup individual keys out of a JSON string in Javascript and return it's Value with Regex. Sort of like building a JSON search tool.

Imagine the following JSON

"{
    "Name": "Humpty",
    "Age": "18",
    "Siblings" : ["Dracula", "Snow White", "Merlin"],
    "Posts": [
        {
            "Title": "How I fell",
            "Comments": [
                { 
                    "User":"Fairy God Mother",
                    "Comment": "Ha, can't say I didn't see it coming"
                }
            ]
        }
    ]
}"

I want to be able to search through the JSON string and only pull out individual properties.

lets assume it's a function already, it would look something like.

function getPropFromJSON(prop, JSONString){
    // Obviously this regex will only match Keys that have
    // String Values.
    var exp = new RegExp("\""+prop+"\"\:[^\,\}]*");
    return JSONString.match(exp)[0].replace("\""+prop+"\":","");    
}

It would return the substring of the Value for the Key.

e.g.

getPropFromJSON("Comments")

> "[
    { 
        "User":"Fairy God Mother",
        "Comment": "Ha, can't say I didn't see it coming"
    }
]"

If your wondering why I want to do this instead of using JSON.parse(), I'm building a JSON document store around localStorage. localStorage only supports key/value pairs, so I'm storing a JSON string of the entire Document in a unique Key. I want to be able to run a query on the documents, ideally without the overhead of JSON.parsing() the entire Collection of Documents then recursing over the Keys/nested Keys to find a match.

I'm not the best at regex so I don't know how to do this, or if it's even possible with regex alone. This is only an experiment to find out if it's possible. Any other ideas as a solution would be appreciated.

AshHeskes
  • 2,304
  • 1
  • 21
  • 27
  • 9
    This is JSON, why don't you just address it as JSON? – fge Jan 05 '12 at 21:33
  • 3
    @fge if you read his post you'd see why. Although it isn't really a valid reason to use regex. Remember regex is a REGULAR language.. and shouldn't be used for this – AlanFoster Jan 05 '12 at 21:34
  • 1
    I doubt it will be worth to implement your own parsing for this. It may perform better if you only need a single value out of a huge document, but if you do multiple searches, keeping the json in memory should be faster. – kapex Jan 05 '12 at 21:45
  • @kapep I was planning on implementing a `cache` anyway, but only for retrieved `Documents`. I can't imagine that storing all the documents in memory would be faster. I still need to `iterate` over the `Keys` in every `Document` in the entire `Collection`. Which is a much bigger performance hit than `JSON.parse()`. – AshHeskes Jan 05 '12 at 22:03
  • It doesn't make sense to try and implement something like this. You could use a state machine to parse the JSON, but trying to use a regex will never completely give you what you want, because JSON values can be arrays, objects, strings, numerics, functions etc. – Jonathan Rich Jan 05 '12 at 22:14
  • @kapep I'm also planning to use this in mobile Apps, so I want to keep memory usage at a minimum. – AshHeskes Jan 05 '12 at 22:14
  • @JonathanRich Not really, I can handle the matched `Value` once found by running it though, ironically `JSON.parse()`. Just to be clear I'm not trying to avoid using `JSON.parse()`, internally it doesn't do much but run a few `Regex`'s anyway. I'm trying to improve performance in general, for the use case. – AshHeskes Jan 05 '12 at 22:25
  • @JonathanRich You can't have functions in JSON. – Paul Jul 05 '13 at 23:04

4 Answers4

31

I would strongly discourage you from doing this. JSON is not a regular language as clearly stated here: https://cstheory.stackexchange.com/questions/3987/is-json-a-regular-language

To quote from the above post:

For example, consider an array of arrays of arrays:

[ [ [ 1, 2], [2, 3] ] , [ [ 3, 4], [ 4, 5] ] ] 

Clearly you couldn't parse that with true regular expressions.

I'd recommend converting your JSON to an object (JSON.parse) & implementing a find function to traverse the structure.

Other than that, you can take a look at guts of Douglas Crockford's json2.js parse method. Perhaps an altered version would allow you to search through the JSON string & just return the particular object you were looking for without converting the entire structure to an object. This is only useful if you never retrieve any other data from your JSON. If you do, you might as well have converted the whole thing to begin with.

EDIT

Just to further show how Regex breaks down, here's a regex that attempts to parse JSON

If you plug it into http://regexpal.com/ with "Dot Matches All" checked. You'll find that it can match some elements nicely like:

Regex

"Comments"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\") 

JSON Matched

"Comments": [
                { 
                    "User":"Fairy God Mother",
                    "Comment": "Ha, can't say I didn't see it coming"
                }
            ]

Regex

"Name"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\")

JSON Matched

"Name": "Humpty"

However as soon as you start querying for the higher structures like "Posts", which has nested arrays, you'll find that you cannot correctly return the structure since the regex does not have context of which "]" is the designated end of the structure.

Regex

"Posts"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\")

JSON Matched

"Posts": [
  {
      "Title": "How I fell",
      "Comments": [
          { 
              "User":"Fairy God Mother",
              "Comment": "Ha, can't say I didn't see it coming"
          }
      ]
Community
  • 1
  • 1
Brandon Boone
  • 16,281
  • 4
  • 73
  • 100
  • 1
    I had a look at the json2.js parse method earlier. It doesn't really do any kind of parsing. It just does a lot of replacing bad/dangerous/escaped characters/content/scripts so the JSON is clean. Then it just passes the clean string to `eval();`. I think your right on using the `Regex` alone thing. I'm going to try and use a combination of `JS` and `Regex`. I disagree on the converting the whole thing and traversing it, for my use case. It would be far too intensive on large `collections || documents`, not to mention searching and matching on multiple properties. – AshHeskes Jan 06 '12 at 01:14
  • Fair enough. Only other thing I could recommend (and I'm not an expert in this field) is to use a format that is relational data friendly. I'm assuming Ms-Sql, MySql, & Oracle have optimal ways of storing the data so reading, writing, comparing, & joining data is super fast (and I doubt it's stored as JSON). Just a thought. – Brandon Boone Jan 06 '12 at 01:27
  • 1
    You should follow the advice in this answer and avoid doing this via any method other than properly deserializing the JSON and searching through the resulting structure. – JAAulde Jan 06 '12 at 14:39
  • If you put a finite fixed limit on the nesting depth of your JSON, it becomes a regular language, however the regex would be very ugly unless your limit is only 1 or 2. – Paul Jul 05 '13 at 22:34
0
\{|\}|\[|\]|,|:|(\\-)?\\d+(\\.\\d+)?|".+?"

Regex

You can use the following regex and iterate with a match over all tokens of a json. You can tokenize the JSON, but the parsing part has to be implemented by you.

Since you're using JavaScript as I assume from the tags, your best way to encode the JSON stays JSON.parse().

Anatoly
  • 193
  • 8
0

I'm almost 10 years late to the party, but I came up with this. Not tested in crazier JSONs than this, but it solves my use cases.

const obj1 = {
  id: 1,
  'name.1': '123',
  address: {
    'address.1': 'Chicken Dinner Road, 69',
    'address.2': 'Psycho lane, 666',
  },
  'age.1': {
    'thisIsSomeCrazyJson.3': 10,
    age: 50,
  },
  types: [
    {
      id: 22,
      'name.name': '123',
      typeOption: {
        id: 1,
        'whoTFWroteThisJSON.2': '123',
      },
    },
    {
      id: 32,
      'name.1': '123',
    },
  ],
};

const obj2 = {
  Name: 'Humpty',
  Age: '18',
  Siblings: ['Dracula', 'Snow White', 'Merlin'],
  Posts: [
    {
      Title: 'How I fell',
      Comments: [
        {
          'User': 'Fairy God Mother',
          'Comment': "Ha, can't say I didn't see it coming",
        },
      ],
    },
  ],
};


function matchKeyDeep(input, pattern) {
  return Object.entries(input).reduce((nextInput, [key, value]) => {
    const isMatch = pattern.test(key);
    if (Array.isArray(value)) {
      const arrValue = value;
      let nextValue = arrValue.map((arrItem) => {
        if (typeof arrItem === 'object') {
          return matchKeyDeep(arrItem, pattern);
        }
        return arrItem;
      });
      if (!isMatch && Array.isArray(nextValue)) {
        nextValue = nextValue.filter((v) => (typeof v === 'object' && v !== null));
        if (nextValue.length === 0) return nextInput;
      }
      nextInput[key] = nextValue;
      return nextInput;
    }
    if (typeof value === 'object') {
      const recurse = matchKeyDeep(value, pattern);
      if (!isMatch && Object.keys(recurse).length === 0) {
        return nextInput;
      }
      nextInput[key] = recurse;
      return nextInput;
    }

    if (isMatch) {
      nextInput[key] = value;
    }

    return nextInput;
  }, {});
}

const res = matchKeyDeep(obj1, /\.\d/);
const res2 = matchKeyDeep(obj2, /Comment/);


console.log(res);
console.log(res2);
Omar Omeiri
  • 1,506
  • 1
  • 17
  • 33
-1

First, stringify the JSON object. Then, you need to store the starts and lengths of the matched substrings. For example:

"matched".search("ch") // yields 3

For a JSON string, this works exactly the same (unless you are searching explicitly for commas and curly brackets in which case I'd recommend some prior transform of your JSON object before performing regex (i.e. think :, {, }).

Next, you need to reconstruct the JSON object. The algorithm I authored does this by detecting JSON syntax by recursively going backwards from the match index. For instance, the pseudo code might look as follows:

find the next key preceding the match index, call this theKey
then find the number of all occurrences of this key preceding theKey, call this theNumber
using the number of occurrences of all keys with same name as theKey up to position of theKey, traverse the object until keys named theKey has been discovered theNumber times
return this object called parentChain

With this information, it is possible to use regex to filter a JSON object to return the key, the value, and the parent object chain.

You can see the library and code I authored at http://json.spiritway.co/

mikewhit
  • 121
  • 5