I need to create a function that takes in a dataframe type object (a column containing text) called new_data and i need to compare the words with my reference. My reference ref_data consists of 2 columns , one with the wrongly spelt word (the same form as that of new_data) and the 2nd column consists of its corrected version.
To put it down simply , i need to compare each word of new_data with the 1st column of ref_data, if it matches , it will return the word of the 2nd column corresponding to that word.
For example , if the word of new_data matches word of ref_data on 3rd row, then the word in column 2 of 3rd row replaces it. Will provide any more clarification if needed. here is what i tried:
I have tried this:
x = [line for line in ref_data['word']] #x is a list of all incorrect words
y = [line for line in ref_data['final']] #y is a list of all correct words
def replace_words(x): #function
for line in x: #iterate over lines in list
for word in line.split(): #iterate over words in list
if word == x: #i dont know the syntax to compare with it.problem here
return (word = y) #i need to return y of the same index.