I have some addresses that I would like to clean.
You can see that in column address1
, we have some entries that are just numbers, where they should be numbers and street names like the first three rows.
df = pd.DataFrame({'address1':['15 Main Street','10 High Street','5 Other Street',np.nan,'15','12'],
'address2':['New York','LA','London','Tokyo','Grove Street','Garden Street']})
print(df)
address1 address2
0 15 Main Street New York
1 10 High Street LA
2 5 Other Street London
3 NaN Tokyo
4 15 Grove Street
5 12 Garden Street
I'm trying to create a function that will check if address1
is a number, and if so, concat address1
and street name from address2
, then delete address2
.
My expected output is this. We can see index 4 and 5 now have complete address1
entries:
address1 address2
0 15 Main Street New York
1 10 High Street LA
2 5 Other Street London
3 NaN Tokyo
4 15 Grove Street NaN <---
5 12 Garden Street NaN <---
What I have tried with the .apply() function:
def f(x):
try:
#if address1 is int
if isinstance(int(x['address1']), int):
# create new address using address1 + address 2
newaddress = str(x['address1']) +' '+ str(x['address2'])
# delete address2
x['address2'] = np.nan
# return newaddress to address1 column
return newadress
except:
pass
Applying the function:
df['address1'] = df.apply(f,axis=1)
However, the column address1
is now all None
.
I've tried a few variations of this function but can't get it to work. Would appreciate advice.