0

I have the following RegExp pattern :

A\.B\.(.+?)\.(\d{3}) (.*)

If I test the following string :

A.B.Prop.001 Blabla

Then I obtain the right values :

Match 1
Submatch 1 : Prop
Submatch 2 : 001
Submatch 3 : Blabla

Now I test the following string :

A.B.Prop.001 Blabla A.B.Desc.032 Blablabla

Then I obtain

Match 1
Submatch 1 : Prop
Submatch 2 : 001
Submatch 3 : Blabla A.B.Desc.032 Blablabla

I obtain only one match, I guess this is because the last (.*) in the pattern is greedy so I replace it with a reluctant option :

A\.B\.(.+?)\.(\d{3}) (.*?)

and I test again the latest complete string. Now this is what I obtain :

Match 1
Submatch 1 : Prop
Submatch 2 : 001
Submatch 3 : 
Match 2
Submatch 1 : Desc
Submatch 2 : 032
Submatch 3 : 

Submatch nr 3 is empty for the two matches. Can any one explain ? Thank you

Jean-Marie
  • 23
  • 4
  • 1
    The `.*` is greedy and will match until the end of the string. The `.*?` is non greedy, and there is nothing following in the pattern so it can settle matching no characters. If you want the 3rd group in both matches, perhaps use `A\.B\.(.+?)\.(\d{3}) (\S+)` – The fourth bird Jan 15 '20 at 14:52
  • Hi 4th bird ! What if I wanted to parse a string such as : – Jean-Marie Jan 15 '20 at 15:14
  • A.B.Prop.001 Bla bla A.B.Desc.032 Bla bla bla – Jean-Marie Jan 15 '20 at 15:14
  • an alternative suggestion would be to enclose the complete pattern in a repetition, anchoring it and keeping the matching behaviour lazy, yielding `^(A\.B\.(.+?)\.(\d{3}) (.*?))+$` – Vogel612 Jan 15 '20 at 15:15
  • @Jean-Marie Perhaps like this `A\.B\.(.+?)\.(\d{3}) (.*?)(?=A\.B\.|$)` https://regex101.com/r/nQwYJp/1 – The fourth bird Jan 15 '20 at 15:22

0 Answers0