I am trying to parse a query which I need to modify to replace a specific property and its value with another property and different values. I am struggling to write a regex that will match the specify property and its value that I need.
Here are some examples to illustrate my point. test:property
is the property name that we need to match.
- Property with a single value:
test:property:schema:Person
- Property with multiple values (there is no limit on how many values there can be - this example uses 3):
test:property:(schema:Person OR schema:Organization OR schema:Place)
- Property with a single value in brackets:
test:property:(schema:Person)
- Property with another property in the query string (i.e. there are other parts of the string that I'm not interested in):
test:property:schema:Person test:otherProperty:anotherValue
Also note that other combinations are possible such as other properties being before the property I need to capture, my property having multiple values with another property present in the query.
I want to match on the entire test:property
section with each value captured within that match. Given the examples above these are the results I am looking for:
# | Match | Groups |
---|---|---|
1 | test:property:schema:Person |
schema:Person |
2 | test:property:(schema:Person OR schema:Organization OR schema:Place) |
schema:Person schema:Organization schema:Person |
3 | test:property:(schema:Person) |
schema:Person |
4 | test:property:schema:Person |
schema:Person |
Note: #1 and #4 produce the same output. I wanted to illustrate that the rest of the string should be ignored (I only need to change the test:property
key and value).
The pattern of schema:Person
is defined as \w+\:\w+
, i.e. one or more word characters, followed by a colon, followed by one or more word characters.
If we define the known parts of the string with names I think I can express what I want to match.
schema:Person
-<TypeName>
- note that the first part,schema
in this case, is not fixed and can be differenttest:property
-<MatchProperty>
<MatchProperty>: // property name (which is known and the same - in the examples this is `test:property`) followed by a colon
( // optional open bracket
<TypeName>
(OR <TypeName>)* // optional additional TypeNames separated by an OR
) // optional close bracket
Every example I've found has had simple alphanumeric characters in the repeating section but my repeating pattern contains the colon which seems to be tripping me up. The closest I've got is this:
(test\:property:(?:\(([\w+\:\w+]+ [OR [\w+\:\w+]+)\))|[\w+\:\w+]+)
Which works okayish when there are no other properties (although the match for example #2 contains the entire property and value as the first group result, and a second group with the property value) but goes crazy when other properties are included.
Also, putting that regex through https://regex101.com/ I know it's not right as the backslash characters in the square brackets are being matched exactly. I started to have a go with capturing and non-capturing groups but got as far as this before giving up!
(?:(\w+\:\w+))(?:(\sOR\s))*(?:(\w+\:\w+))*