0

I have string bellow

object1: {
   a: 'text a',
   b: 'text b',
},
object2: {
   a: 'text2 a',
   b: 'text2 b',
}

I have regex

r"(object1|object2):\s\{(?:.*?)(\w+):\s[\'\"]text2 b[\'\"]" with flags re.DOTALL

I expected is ('object2', 'b')

But actual is ('object1', 'b')

Peter
  • 120
  • 1
  • 8
  • That's because non-greedy matching works forward, not backwards. See https://stackoverflow.com/questions/27385942/why-is-this-simple-non-greedy-regex-being-greedy – joanis Sep 15 '19 at 14:46
  • I would recommend parsing this with a JSON parser instead of a Regex, by the way. – joanis Sep 15 '19 at 14:48
  • @joanis this string in a file javascript so I can't JSON parser – Peter Sep 15 '19 at 14:49
  • 1
    To solve your problem, you have to replace `.*?` by something that would not match `object?`, e.g., `...\s\{(?!.*object)(?:.*?)(\w+)...`, but this is just a hint, because it will fail on you if `object` occurs anywhere later in the string, so this won't fully solve your problem. – joanis Sep 15 '19 at 14:52
  • But your question is tagged python, so maybe https://www.google.com/search?q=json+parser+python – joanis Sep 15 '19 at 14:55
  • @joanis Thank you for your advice, I have fixed `(object1|object2):\s\{.(?:(?!}).)*?(\w+):\s[\'\"]text2 b[\'\"]` – Peter Sep 15 '19 at 14:56
  • If just blocking `}` is enough for you, I would go with @LouisCaron's solution, it will be more efficient, and it should match the same strings. – joanis Sep 15 '19 at 21:32

1 Answers1

2

As indicated non-greediness works forward, not on already matched groups. One solution could be to introduce a failing search element in your forward matching group:

r"(object1|object2):\s\{(?:[^}]*?)(\w+):\s[\'\"]text2 b[\'\"]"

In this possible solution, the char '}' is excluded from the matching elements before matching the subelements of the structure, which makes sense.

Louis Caron
  • 1,043
  • 1
  • 11
  • 17