3

How do I remove the +4 from zipcodes, in python?

I've got data like

85001
52804-3233
Winston-Salem

And I want that to become

85001
52804
Winston-Salem
ram1
  • 6,290
  • 8
  • 41
  • 46

6 Answers6

3
>>> zip = '52804-3233'
>>> zip[:5]
'52804'

...and of course when you parse your lines from the original data you should insert some kind of rule to distinguish between zipcode to fix and other strings, but I don't know how your data looks like, so I can't help much (you could check if they are only digits and the '-' symbol, maybe?).

mac
  • 42,153
  • 26
  • 121
  • 131
3
>>> import re
>>> s = "52804-3233"
>>> # regex to remove a dash and 4 digits after the dash after 5 digits:
>>> re.sub('(\d{5})-\d{4}', '\\1', s)
'52804'

The \\1 is a so called back reference and gets replaced by the first group, which would be the 5 digit zipcode in this case.

miku
  • 181,842
  • 47
  • 306
  • 310
2

You could try something like this:

for input in inputs:
    if input[:5].isnumeric():
        input = input[:5]
        # Takes the first 5 characters from the string 

Just take away the first 5 characters of anything that is numbers in the first 5 positions.

mac
  • 42,153
  • 26
  • 121
  • 131
Civilian
  • 614
  • 2
  • 9
  • 29
  • I think you mean `input[0:4]`? – ram1 Jun 27 '11 at 18:44
  • 3
    No, in Python slices, you indicate the first character that you want, followed by the first character you *don't* want. `[0:5]` is the first five elements of a list or string. – kindall Jun 27 '11 at 18:46
  • Also, crap! I realized that the function we wanted is: isdigit() instead of isnumeric(). isnumeric works on unicode rather than strings. http://www.tutorialspoint.com/python/string_isdigit.htm – Civilian Jul 14 '11 at 21:23
2
re.sub('-\d{4}$', '', zipcode)
mhyfritz
  • 8,342
  • 2
  • 29
  • 29
1

This grabs all items of the format 00000-0000 with a space or other word boundary before and after the number and replaces it with the first five digits. The other regex's posted will match some other number formats that you might not want.

re.sub('\b(\d{5})-\d{4}\b', '\\1', zipcode)
Jacob Eggers
  • 9,062
  • 2
  • 25
  • 43
1

Or without regex:

output = [line[:5] if line[:5].isnumeric() and line[6:].isnumeric() else line for line in text if line]
Artsiom Rudzenka
  • 27,895
  • 4
  • 34
  • 52