I'm trying to find a regular expression which matches repeated keys on different levels of a nested JSON string representation. All my "solutions" suffer from catastrophic backtracking so far.
An example of that JSON string looks like this:
d = {
"a": {
"b": {
"c": {
"d": "v1",
"key": "v2"
}
},
"c": {
"g": "v3",
"key": "v4"
},
"key": "v5"
}
}
The value of key
is the target. My application does have all object names leading to that key. With these names I can use a for loop to construct my final regex. So basically I need the parts to put in between.
Example:
If I get "a"
and "key"
I could construct the following: "a"[^}]*"key"
. This matches the first "key" in my string d
, the one with value v2.
What should happen though, is that "a"
+ "key"
matches the key with value v5. The key with value v2 should be match when the full path "a"
+ "b"
+ "c"
+ "key"
comes in. The last case in this example would be matching the key with value v4 when "a"
+ "c"
+ "key"
is given.
So a complete regex for the last one would look similar to this:
"a"MATCH_EVERYTHING_IN_BETWEEN_REGEX"c"MATCH_EVERYTHING_IN_BETWEEN_REGEX"key":\s*(\[[^}]*?\]|".*?"|\d+\.*\d*)
To be clear, I am looking for this MATCH_EVERYTHING_IN_BETWEEN_REGEX expression which I can plug in as connectors. This is to make sure it matches only the key I have received the path for. The JSON string could be infinitely nested.
Here is an online regex tester with the example: https://regex101.com/r/yNZ3wo/2
Note:
I know this is not python specific but I'm also grateful about python hints in this context. I thought about building my own parser, using a stack and counting {
and }
but before I would like to make sure there is no easy regex solution.
EDIT: I know about the json library but this doesn't solve my case since I'm tracking the coordinates of my targets within the string representation inside an editor window. I'm not looking for the values themselves, I can access them from an associated dictionary.