0

I am unsure how to tell a regular expression in Python to stop after finding the first match.

Apparently you can tell regex to be lazy, RegEx - stop after first match , I tried placing (.*?) at the end of my expression but that just broke it. I just want it to stop after finding the first complete address and return that.

Sample code with data: https://regexr.com/6okuv

In the sample data all addresses are accepted by the expression except "Hindenburgdamm 27, Hygiene-Institut" where it should stop after "27" and return "Hindenburgdamm 27" and "Peschkestr. 5a/Holsteinische Str. 44" where it should stop after "5a" and return "Peschkestr. 5a".

Regex expression : 
^([A-Za-zÄäÖöÜüß\s\d.-]+?)\s*([\d\s]+(?:\s?[-+/]\s?\d+)?\s*[A-Za-z]?-?[A-Za-z]?)?$

Sample data:
Berliner Str. 74
Hindenburgdamm 27, Hygiene-Institut
Peschkestr. 5a/Holsteinische Str. 44
Lankwitzer Str. 13-17a
Fidicinstr. 15A
Haudegen Weg 15/17
Johanna-Stegen-Strasse 14a-d
Friedrichshaller Str. 7
Südwestkorso 9
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
oddbook
  • 13
  • 3

2 Answers2

0

You could make the pattern a bit more specific for the digits and the trailing characters, and match at least a single digit using a case insensitive match:

^([A-ZÄäÖöÜüß.\s-]+?)\s*(\d+(?:[/-]\d+)?(?:[A-Z](?:-[A-Z])?)?)\b

Explanation

  • ^ Start of string
  • ([A-ZÄäÖöÜüß.\s-]+?) Capture group 1
  • \s* Match optional whitespace chars
  • ( Capture group 1
    • \d+ Match 1+ digits
    • (?:[/-]\d+)? Optionally match / - and 1+ digits
    • (?:[A-Z](?:-[A-Z])?)? Optionally match A-Z followed by an optional - and A-Z
  • ) Close group 2
  • \b A word boundary

Regex demo

If you want a match only and don't need the capture groups you can omit them.

Note that in the leading character class there are chars like ., - and \s If the match should not start with any of these characters you can start with a character class without those following by an optionally repeated character class to still match at least 1 character.

^[A-ZÄäÖöÜüß][A-ZÄäÖöÜüß.\s-]*?\s*\d+(?:[/-]\d+)?(?:[A-Z](?:-[A-Z])?)?\b

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    Many thanks for your step-by-step explanation, I really appreciate it. It works with all my test cases in RegExr! I've been sitting at this since before lunch! This is great. Now I just have to apply it to my pandas dataframe. – oddbook Jun 28 '22 at 13:01
0

You can try this pattern

^([A-Za-zÄäÖöÜüß\s\d.-]+?\s[0-9a-zA-zÄäÖöÜüß-]+?)[\s\/,]?

In any case if you don't expect to match the full line don't use the $ to expect the regular expression to reach EOL.

dan89
  • 235
  • 1
  • 3
  • 11