1

I'm having difficulty using regex to solve this expression,

e.g when given below: 
regex_exp(address, "OG 56432") 


It should return

"OG 56432: Middle Street Pollocksville | 686"


address is an array of strings:

address = [
  "622 Gordon Lane St. Louisville OH 52071",
  "432 Main Long Road St. Louisville OH 43071",
  "686 Middle Street Pollocksville OG 56432"
]


My solution currently looks like this (Python):

import re
def regex_exp(address, zipcode):
    for i in address:
        if zipcode in i:
            postal_code = (re.search("[A-Z]{2}\s[0-9]{5}", x)).group(0)
            # returns "OG 56432"

            digits = (re.search("\d+", x)).group(0)
            # returns "686"

            address = (re.search("\D+", x)).group(0)
            # returns "Middle Street Pollocksville OG"

            print(postal_code + ":" + address + "| " + digits)

regex_exp(address, "OG 56432")
# returns OG 56432: High Street Pollocksville OG | 686

As you can see from my second paragraph, this is not the correct answer - I need the returned value to be

"OG 56432: Middle Street Pollocksville | 686"

How do I manipulate my address variable Regex search to exclude the 2 capital consecutive capital letters? I've tried things like

address = (re.search("?!\D+", x)).group(0)

to remove the two consecutive capitals based on A regular expression to exclude a word/string but I think this is a step in the wrong direction.

PS: I understand there are easier methods to solve this, but I want to use regex to improve my fundamentals

sgeza
  • 341
  • 1
  • 5
  • 16
  • If they're consistently formatted - can't you use something like: `'{2}: {1} | {0}'.format(*re.match('(\d+) (.*?) ([A-Z]{2} \d{5})', "686 Middle Street Pollocksville OG 56432").groups())` ? – Jon Clements Aug 18 '18 at 09:00
  • yeah i think i could! how does the (.*?) separator work? – sgeza Aug 18 '18 at 09:04
  • Takes anything until the next pattern matches... – Jon Clements Aug 18 '18 at 09:05
  • hey Jon! I coudln't find anything regarding the differences of re.match and *re.match. I only found documentation regarding the use of * as a greedy quantifier in the regex itself, but nothing in the re.match bit – sgeza Aug 19 '18 at 06:21
  • https://docs.python.org/3/tutorial/controlflow.html#unpacking-argument-lists and https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters etc... – Jon Clements Aug 19 '18 at 06:23
  • thanks! so basically because we're using .groups() here we want it to return a tuple? – sgeza Aug 19 '18 at 06:35
  • Not quite... `.groups()` returns a tuple... we then unpack that so that `.format(...)` can be used as needed... – Jon Clements Aug 19 '18 at 06:36

2 Answers2

0

If you just want to remove the two consecutive Capital Letters which are predecessor of zip-code(a 5 digit number) then use this

import re
text = "432 Main Long PC Market Road St. Louisville OG 43071"
address = re.sub(r'([A-Z]{2}[\s]{1})(?=[\d]{5})','',text)
print(address) 
# Output: 432 Main Long PC Market Road St. Louisville 43071

For removing all occurrences of two consecutive Capital Letters:

import re 
text = "432 Main Long PC Market Road St. Louisville OG 43071" 
address = re.sub(r'([A-Z]{2}[\s]{1})(?=[\d]{5})','',text)
print(address) 
# Output: 432 Main Long Market Road St. Louisville 43071
Akay Nirala
  • 1,136
  • 7
  • 13
0

With re.sub() and group capturing you can use:

s="686 Middle Street Pollocksville OG 56432"
re.sub(r"(\d+)(.*)\s+([A-Z]+\s+\d+)",r"\3: \2 | \1",s)
Out: 'OG 56432:  Middle Street Pollocksville | 686'
kantal
  • 2,331
  • 2
  • 8
  • 15