I know similar questions have been asked, but none of the solutions I've found have worked (I've listed some of them at the bottom).
I have a list of zip codes where some of them include the +4, which I don't need. Others are either too short (typos) or foreign. Currently all cells are objects.
Example could be:
member state country zip joined
16081 NY UNITED STATES 11215 9/4/09
21186 NY UNITED STATES 5325 8/9/11
34999 NY UNITED STATES 11218 11/4/16
34999 NY FOOBAR STATES NaN 11/4/16
5033 NY UNITED STATES 11238-1630 11/7/16
35079 NY FOOBAR STATES SW4 9JX 11/13/16
35084 NY UNITED STATES 11217-2181 11/14/16
and I'd like to end up with
member state country zip joined
16081 NY UNITED STATES 11215 9/4/09
21186 NY UNITED STATES 5325 8/9/11
34999 NY UNITED STATES 11218 11/4/16
34999 NY FOOBAR STATES NA 11/4/16
5033 NY UNITED STATES 11238 11/7/16
35079 NY FOOBAR STATES SW4 9JX 11/13/16
35084 NY UNITED STATES 11217 11/14/16
Here are a few things I've tried in terms of coding:
for x in df.zip:
if len(x) > 5:
print x.split("-")[0]
x[:x.index("-")]
returns TypeError: object of type 'float' has no len()
df['zips'] = df['zip'].map(lambda x: x.rstrip('-'/n))
returns NameError: global name 'n' is not defined
def zipclip(x):
if x.isnumeric:
if len(x) > 5:
return z[:5]
elif len(x) < 5:
return "NA"
returns AttributeError: 'str' object has no attribute 'isnumeric'
df.zip = [line[:5] if line[:5].isnumeric() and line[6:].isnumeric() else\
line for line in zip if line]
returns TypeError: 'builtin_function_or_method' object is not iterable
Here are some of the places I've looked:
- Pandas DataFrame: remove unwanted parts from strings in a column
- Remove -#### in zipcodes
- Pandas delete parts of string after specified character inside a dataframe
(sorry if I've gone overboard with documentation--I've been criticized in the past and wanted to make sure folks knew I've been working on it)