1

I have 4 matches with my pattern:

\d+\/?\d+\s[A-z]+.(?!\d)

Regex demo

regex demo

Require parsing of 4 strings:

17 Howard Rd Howard. Stdnt 
11/169 Wall Road, Wontown, Wkr 
105 AGNEW, Marilyn Barbara 
106 AGNEW, Mavis Rosina

If I add * or + after . The match goes to the end of the string. So I lose the matches and the negative lookup. How do I reconfigure this regex to extend the matches so I get 4 complete strings?

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
Dave
  • 687
  • 7
  • 15
  • 1
    Use `+?` or `*?` and positive lookahead: [`\d+/?\d+\s[A-z]+.+?(?=\s\d|$)`](https://regex101.com/r/mPt19w/2) – InSync May 30 '23 at 23:59
  • 1
    ...or use `\D` (non-digit) if there is no digit in your road names: [`\d+/?\d+\s[A-z]+\D+`](https://regex101.com/r/mPt19w/3) – InSync May 31 '23 at 00:02
  • Thanks. I think my mistakes were (1) Contents of a positive lookahead are NOT part of a match (2) Adding the `?` to `.+`turns the 'unlimited times' `.+` lazy. Which then forces the returned matches to be 'restricted' by the positive lookahead. – Dave May 31 '23 at 03:36
  • 1
    Depending on data you might also consider to split: [`re.split(r' +(?=\d)', s)`](https://tio.run/##RY2xCsIwFEX3fMXd0mKpBrHiUERBdNGhDl1cIgkYrEl4CZZ@fYwguB3u4XD9FB/OLlMyL@8ogjRjAS24WOPkRkkKnfpRjWtUNkKIuWg26OUwoHNSVeidjW60GZ4EsVhhd7wc@gpnSWaYLPaS7pJkVs1fvU3IeTBWcsZIf19J18EPJhbEMSu27U2VvEIoGfNkbJ51KFP6AA) – bobble bubble May 31 '23 at 09:16

1 Answers1

3

Your pattern matches at least 2 digits with this notation \d+\/?\d+ and note that [A-z] matches more than [A-Za-z]

The dot in this part .(?!\d) also matches a space, that is why your matches have either a space or a comma at the end.

You might use:

(?<!\S)\d+(?:/\d+)?\s[A-Za-z].*?(?=\s+\d+\b|$)
  • (?<!\S) Assert a whitespace boundary to the left
  • \d+(?:/\d+)? Match 1+ digits with an optional / and 1+ digits
  • \s[A-Za-z].*? Match a whitespace char followed by a single char A-Za-z and then as few as possible chars
  • (?= Positive lookahead
    • \s+\d+\b Match 1+ whitespace chars, 1+ digits
    • | Or
    • $ End of the string
  • ) Close the lookahead

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70