I currently have a csv file. The data originally is derived from PDF and doing a further analysis on the data, There is an [States] column where some the values are spelled wrong, i.e it contains wrong spelling or integers.
I need instead of numbers or the wrong spelling in the rows, It can parse First 3 characters or last 3 Characters and match it with the dictionary Firstly it needs to match with first 3 character if it is not matching then look for the last 3 character and gives the correct values to be saved into it.
Such as in the following example:
state_name
Assan
Andhra Prade5h
M1zoram
Uttar Pr8desh
Expected Output:
state_name
Assam
Andhra Pradesh
Mizoram
Uttar Pradesh
What I have tried so far:
dict = {'Assam','Andhra Pradesh', 'Mizoram', 'Uttar Pradesh'}
##Didn't worked
df['state_name'] = df['state_name'].map(dict).fillna(df['state_name'])
##Then Tried Using the below method but not able to perform the further process
state1 = df['state_name'].str[3:]
state2 = df['state_name'].str[:3]
So I have no clue how to handle this problem.