0

I have an address column in a dataframe. I need to extract the state from the address. Since address is not always in unique format, I have created a list containing all states. If address contain any string in the list, retrieve that string.

state_list=['Punjab','Kerala','Orissa']

location_list=['adr1, Orissa','adr2, Punjab','ad3, ppp','adr4: Kerala']
df=pd.DataFrame(location_list, columns=['location'])

expected output:

location         state 
adr1, Orissa\   Orissa 
adr2, Punjab\   Punjab
ad3, ppp\          nan
adr4: kerala\   Kerala

Code tried:

any(t in x for x in location_list for t in state_list)
df[df['location'].str.contains('Punjab')] # this works for single state
Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
Sangeetha R
  • 139
  • 7
  • `df["state"] = df["location"].str.extract( r"({})".format("|".join(state_list)), flags=re.IGNORECASE, expand=False )` Use this single line. and print the df to check the result print(df) – Faisal Nazik Oct 31 '22 at 09:29

0 Answers0