regular expression to exclude 2 consecutive capital letters

Question

I'm having difficulty using regex to solve this expression,

e.g when given below: 
regex_exp(address, "OG 56432")

It should return

"OG 56432: Middle Street Pollocksville | 686"

address is an array of strings:

address = [
  "622 Gordon Lane St. Louisville OH 52071",
  "432 Main Long Road St. Louisville OH 43071",
  "686 Middle Street Pollocksville OG 56432"
]

My solution currently looks like this (Python):

import re
def regex_exp(address, zipcode):
    for i in address:
        if zipcode in i:
            postal_code = (re.search("[A-Z]{2}\s[0-9]{5}", x)).group(0)
            # returns "OG 56432"

            digits = (re.search("\d+", x)).group(0)
            # returns "686"

            address = (re.search("\D+", x)).group(0)
            # returns "Middle Street Pollocksville OG"

            print(postal_code + ":" + address + "| " + digits)

regex_exp(address, "OG 56432")
# returns OG 56432: High Street Pollocksville OG | 686

As you can see from my second paragraph, this is not the correct answer - I need the returned value to be

"OG 56432: Middle Street Pollocksville | 686"

How do I manipulate my address variable Regex search to exclude the 2 capital consecutive capital letters? I've tried things like

address = (re.search("?!\D+", x)).group(0)

to remove the two consecutive capitals based on A regular expression to exclude a word/string but I think this is a step in the wrong direction.

PS: I understand there are easier methods to solve this, but I want to use regex to improve my fundamentals

If they're consistently formatted - can't you use something like: `'{2}: {1} | {0}'.format(*re.match('(\d+) (.*?) ([A-Z]{2} \d{5})', "686 Middle Street Pollocksville OG 56432").groups())` ? — Jon Clements, Aug 18 '18 at 09:00
hey Jon! I coudln't find anything regarding the differences of re.match and *re.match. I only found documentation regarding the use of * as a greedy quantifier in the regex itself, but nothing in the re.match bit — sgeza, Aug 19 '18 at 06:21
https://docs.python.org/3/tutorial/controlflow.html#unpacking-argument-lists and https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters etc... — Jon Clements, Aug 19 '18 at 06:23
thanks! so basically because we're using .groups() here we want it to return a tuple? — sgeza, Aug 19 '18 at 06:35
Not quite... `.groups()` returns a tuple... we then unpack that so that `.format(...)` can be used as needed... — Jon Clements, Aug 19 '18 at 06:36

score 0 · Answer 1 · answered Aug 18 '18 at 10:28

If you just want to remove the two consecutive Capital Letters which are predecessor of zip-code(a 5 digit number) then use this

import re
text = "432 Main Long PC Market Road St. Louisville OG 43071"
address = re.sub(r'([A-Z]{2}[\s]{1})(?=[\d]{5})','',text)
print(address) 
# Output: 432 Main Long PC Market Road St. Louisville 43071

For removing all occurrences of two consecutive Capital Letters:

import re 
text = "432 Main Long PC Market Road St. Louisville OG 43071" 
address = re.sub(r'([A-Z]{2}[\s]{1})(?=[\d]{5})','',text)
print(address) 
# Output: 432 Main Long Market Road St. Louisville 43071

score 0 · Answer 2 · answered Aug 18 '18 at 16:54

0

With re.sub() and group capturing you can use:

s="686 Middle Street Pollocksville OG 56432"
re.sub(r"(\d+)(.*)\s+([A-Z]+\s+\d+)",r"\3: \2 | \1",s)
Out: 'OG 56432:  Middle Street Pollocksville | 686'

answered Aug 18 '18 at 16:54

kantal

2,331
2
8
15

regular expression to exclude 2 consecutive capital letters

2 Answers2