-1

I want to make regex that will recognize official Swiss post addresses. They look like this:

Mr
Hans Schweizer
Gerechtigkeitsgasse 10
3011 Berne

Ms
Susi Frei
c/o Hans Schweizer
Gerechtigkeitsgasse 10
3011 Berne 

Mr
Erich Müller
Bahnhofstrasse 4/8
8001 Zurich

So from text that goes something like:

'You should send a letter to: 
    Mr
    Hans Schweizer
    Gerechtigkeitsgasse 10
    3011 Berne

and tell him all about your last summer...'

Regex should only extract info about address.

I looked at this post: FInd a US street address in text (preferably using Python regex)

And tried to mimic it, but I failed, I could not make it work.

Address should contain:

gender (Herr|Frau|Mrs|Mr|Ms)
name: two or 3 string titled words
street: (strasse|gasse|weg|platz|promenade)
code: int numbers
city: (Zurich|Zürich|Basel|Geneva|Lausanne|Bern|Winterthur|Lucerne|St. Gallen|St.Gallen)

So 95% of streets in Switzerland has suffix "strasse" or "gasse" etc., and Im looking for only some cities (but later I would probably add more).

My problem is that I do not know how to put all this into one regex.

Can you show me how to make regex that will recognize Swiss addresses.

taga
  • 3,537
  • 13
  • 53
  • 119
  • Could you include the regex that you tried? – Juan Carlos Ramirez Oct 12 '20 at 13:31
  • Can you explain what the "Address should contain:" part means? This looks like you assume that there are ten cities in Switzerland and all the streets end with the five listed suffixes ... – Oli Oct 12 '20 at 13:33
  • @Oli So 95% of streets in Switzerland has suffix "strasse" or "gasse" etc., and Im looking for only some cities (but later I would probably add more). In my docs I have four or 5 lines that are listed as above. So it goes Mr, Name and Last Name, Street, City – taga Oct 12 '20 at 13:38
  • Looks like you are looking to create a regex, but do not know where to get started. Please check [Reference - What does this regex mean](https://stackoverflow.com/questions/22937618) resource, it has plenty of hints. Also, refer to [Learning Regular Expressions](https://stackoverflow.com/questions/4736) post for some basic regex info. Once you get some expression ready and still have issues with the solution, please edit the question with the latest details and we'll be glad to help you fix the problem. – Wiktor Stribiżew Oct 12 '20 at 13:54

1 Answers1

0

Something like : (Herr|Frau|Mrs|Mr|Ms)\n([a-zA-Zü]+ ){1,2}[a-zA-Zü]+\n[a-zA-Zü]+(strasse|gasse|weg|platz|promenade) ([0-9]{1,4}|[0-9]{1,4}/[0-9]{1,4})\n[0-9]{1,4} (Zurich|Zürich|Basel|Geneva|Lausanne|Bern|Winterthur|Lucerne|St. Gallen|St.Gallen)

If you want more information don't hesitate to ask. Maybe there is something wrong Don't forget where I use [a-zA-Z]+ to add all accented characters etc...

Here is a sample code on how to check it :

import re

pattern = "(Herr|Frau|Mrs|Mr|Ms)\n([a-zA-Zü]+ ){1,2}[a-zA-Zü]+\n[a-zA-Zü]+(strasse|gasse|weg|platz|promenade) ([0-9]{1,4}|[0-9]{1,4}/[0-9]{1,4})\n[0-9]{1,4} (Zurich|Zürich|Basel|Geneva|Lausanne|Bern|Winterthur|Lucerne|St. Gallen|St.Gallen)"


test1 = "Mr\nHans Schweizer\nGerechtigkeitsgasse 10/4657\n3011 Bern"
test2 = "Ms\nSusi Frei\nc/o Hans Schweizer\nGerechtigkeitsgasse 10\n3011 Berne"
test3 = "Mr\nErich Müller\nBahnhofstrasse 4/8\n8001 Zurich"
test4 = "this is bad :("
test5 = "Mr\nErich Müller\nBahnhofstrasse 444db\n8001 Zurich"
test6 = "Hans Schweizer\nGerechtigkeitsgasse 10\n3011 Berne"

res = re.fullmatch(pattern, test1)
print("test1:")
print(res != None)

res = re.fullmatch(pattern, test2)
print("\ntest2:")
print(res != None)

res = re.fullmatch(pattern, test3)
print("\ntest3:")
print(res != None)

res = re.fullmatch(pattern, test4)
print("\ntest4:")
print(res != None)


res = re.fullmatch(pattern, test5)
print("\ntest5:")
print(res != None)

res = re.fullmatch(pattern, test6)
print("\ntest6:")
print(res != None)

with the following output (test 4, 5 and 6 are wrong texts):

test1:
True

test2:
False

test3:
True

test4:
False

test5:
False

test6:
False

Your second exemple is not working because there is one more lane than in your others exemples and you don't mention it in your explanation (c/o Hans Schweizer). So feel free to indicate what are the rules for that lane.

Amiral ASTERO
  • 365
  • 1
  • 6
  • 1
    Thanks for your answer but I run your regex on all 3 examples above and it does not work. Also, how should I add accented characters? – taga Oct 12 '20 at 13:46
  • I'll try it on python, i was testing on https://pythex.org/ so maybe that's not correct – Amiral ASTERO Oct 12 '20 at 13:48
  • Please check the question again, I updated it. Can you also include the code of finding specific pattern in text and extracting it? – taga Oct 12 '20 at 13:49
  • I updated my response, feel free to read it and come again to me :) – Amiral ASTERO Oct 12 '20 at 14:32
  • How can I print result that I found? For example, for test3, I want to print the lines in text that are matching the regex – taga Oct 20 '20 at 09:03