1

I am trying to get the span of the city name from some addresses, however I am struggling with the required regex. Examples of the address format is below.

flat 1, tower block, 34 long road, Major city

flat 1, tower block, 34 long road, town and parking space

34 short road, village on the river and carpark (7X3 8RG)

The expected text to be captured in each case is "Major city", "town" and "village on the river". The issue is that sometimes "and parking space" or a variant is included in the address. Using a regex such as "(?<=,\s)\w+" would return "town and parking space" in the case of example 2.

The city is always after the last comma of the address.

I have tried to re-work this question but have not successfuly managed to exclude the "and parking space" section.

I have already created a regex that excludes the postcodes this is just included as an answer would ideally allow for that part of the regex to be bolted on the end.

How would I create a regex that starts after the last comma and runs to the end of the address but stops at any "and parking" or postcodes?

Jonno Bourne
  • 1,931
  • 1
  • 22
  • 45
  • 1
    Try `,\s*((?:(?!\sand\s)[^,])*)(?=[^,]*$)`, see [the regex demo](https://regex101.com/r/NUevDS/1). Or, `.*,\s*((?:(?!\sand\s)[^,])*)`, see [this demo](https://regex101.com/r/NUevDS/2). – Wiktor Stribiżew Apr 14 '22 at 20:27
  • I hate how good you are at RE @WiktorStribiżew :p – OTheDev Apr 14 '22 at 20:29

2 Answers2

3

You can capture these strings using

,\s*((?:(?!\sand\s)[^,])*)(?=[^,]*$)
,\s*([^,]*?)(?=(?:\sand\s[^,]*)?$)
.*,\s*((?:(?!\sand\s)[^,])*)
.*,\s*([^,]*?)(?=(?:\sand\s[^,]*)?$)

See this regex demo or this regex demo.

Details:

  • , - a comma ]
  • \s* - zero or more whitespaces
  • ((?:(?!\sand\s)[^,])*) - Group 1: any char other than a comma, zero or more occurrences, that does not start whitespace + and + whitespace char sequence
  • (?=[^,]*$) - there must be any zero or more chars other than a comma till end of string.

In Python, you would use

m = re.search(r'.*,\s*([^,]*?)(?=(?:\sand\s[^,]*)?$)', text)
if m:
    print(m.group(1))

See the demo:

import re
texts = ['flat 1, tower block, 34 long road, Major city',
'flat 1, tower block, 34 long road, town and parking space',
'34 short road, village on the river and carpark (7X3 8RG)']
rx = re.compile(r'.*,\s*([^,]*?)(?=(?:\sand\s[^,]*)?$)')
for text in texts:
    m = re.search(rx, text)
    if m:
        print(m.group(1))

Output:

Major city
town
village on the river
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    You’re either a `regex` god or you have the best `regex` cheat sheet in the observable universe. Which one is it? Please come clean. – Mihai Apr 14 '22 at 21:03
  • 1
    @Mihai I am answering regex SO questions every day from spring 2015. [Visited 2628 days, 2619 consecutive](https://imgur.com/a/X5KU3mE). – Wiktor Stribiżew Apr 14 '22 at 21:07
  • I changed the part with the "and" to read "(and\s|\s?\(?\b[a-z]{1,2}\d[a-z0-9]?\s\d[a-z]{2}\b)" for the case that "and car park" is not present but postcode is. Is using the capture group and or statement a sensible approach for this? – Jonno Bourne Apr 15 '22 at 09:11
  • 1
    @JonnoBourne If that works for you, why not. If you share a regex101 link with a demo, I could provide more insight - if you need any. – Wiktor Stribiżew Apr 15 '22 at 09:14
0

I would do:

import re 

exp = ['flat 1, tower block, 34 long road, Major city',
'flat 1, tower block, 34 long road, town and parking space',
'34 short road, village on the river and carpark (7X3 8RG)']

for e in (re.split(',\s*', x)[-1] for x in exp):
    print(re.sub(r'(?:\s+and car.*)|(?:\s+and parking.*)','',e))

Prints:

Major city
town
village on the river

Works like this:

  1. Split the string on ,\s* and take the last portion;
  2. Remove anything from the end of that string that starts with the specified (?:\s+and car.*)|(?:\s+and parking.*)

You can easily add addition clauses to remove with this approach.

dawg
  • 98,345
  • 23
  • 131
  • 206