python - get zipcode from full address

Question

I have a dataframe with full addresses in a column, and I need to create a separate column with just the zip code. Some of the addresses just have the five digit zip code whereas others have the additional four digits.

How do I split the column to just get the zip code?

Example Data

d = {'name':['bob','john'],'address':['123 6th Street,Sterling VA 20165-7513','567 7th Street, Wilmington NC 28411']}
df = pd.DataFrame(d)

I tried using rpartition but I get everything before the zip code:

df['test'] = df['address'].str.rpartition(" ")
print(df)
name    address                                test
bob     123 6th Street,Sterling VA 20165-7513  123 6th Street,Sterling VA
john    567 7th Street, Wilmington NC 28411    567 7th Street, Wilmington NC

This is what I'm trying to get:

name    address                                zipcode
bob     123 6th Street,Sterling VA 20165-7513  20165-7513
john    567 7th Street, Wilmington NC 28411    28411

Can it be safely assumed that the zipcode is at the end? – fizzybear Jul 05 '19 at 20:38 — fizzybear, Jul 05 '19 at 20:38
@fizzybear Yes, the zip code is always at the end. – Dread Jul 05 '19 at 20:44 — Dread, Jul 05 '19 at 20:44

score 7 · Accepted Answer · answered Jul 05 '19 at 20:50

Use a regex with str.extract():

df['zip'] = df['address'].str.extract(r'(\d{5}\-?\d{0,4})')

returns:

   name                                address         zip
0   bob  123 6th Street,Sterling VA 20165-7513  20165-7513
1  john    567 7th Street, Wilmington NC 28411       28411

See the pandas page on str.extract() and the python page on re.

In particular, the {5} specifies that we must match 5 repetitions of \d (a numerical digit), while {0,4} indicates that we can match from 0 to 4 repetitions.

score 1 · Answer 2 · answered Jul 05 '19 at 20:51

1

You can Try this

df['zip']= [i[-1] for i in df.address.str.split(' ').values]

answered Jul 05 '19 at 20:51

Fouad Selmane

378
2
11

score 0 · Answer 3 · answered Jul 05 '19 at 20:53

0

You need to split the spaces, get the last item and you'll have the zipcode.

Something like this:

zipcodes = list()

for item in d['address']:
    zipcode = item.split()[-1]
    zipcodes.append(zipcode)

d['zipcodes'] = zipcodes
df = pd.DataFrame(d)

answered Jul 05 '19 at 20:53

João Victor Monte

183
6

python - get zipcode from full address

3 Answers3

Linked