I posted a "part 1" question that got me to the answer of the function I needed here but thought that this warranted its own question. If not, I will remove.
I want to apply a function to a dataframe that replaces the Full state name to the Abbreviation (New York -> NY
). However I noticed in my dataset that if a state was Capitalized it obviously would not match the dicitonary. I tried to work around it, but can't seem to crack the code:
import pandas as pd
import numpy as np
dfp = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN],
'B' : [1,0,3,5,0,0,np.NaN,9,0,0],
'C' : ['Pharmacy of IDAHO','NY Pharma','NJ Pharmacy','Idaho Rx','CA Herbals','Florida Pharma','AK RX','Ohio Drugs','PA Rx','USA Pharma'],
'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})
import us
statez = us.states.mapping('abbr', 'name')
inv_map = {v: k for k, v in statez.items()}
def replace_states(company):
# find all states that exist in the string
state_found = filter(lambda state: state.lower() in company.lower(), statez.values())
# replace each state with its abbreviation
for state in state_found:
print(state, inv_map[state])
company = company.replace(state, inv_map[state])
print("---" , company)
# return the modified string (or original if no states were found)
return company
dfp['C'] = dfp['C'].map(replace_states)
output: notice the lack of change in " Pharmacy of IDAHO"
Idaho ID
--- Pharmacy of IDAHO
Idaho ID
--- ID Rx
Florida FL
--- FL Pharma
Ohio OH
--- OH Drug
is there a way to make this function case insensitive?