I have written code that compares two strings to find matching words. Now I'd like to be able to find words that are relatively close. For example, book and brook are similar whereas book and luck are not. How should I go about this?
I was thinking to split each word into characters then count the frequency of said characters? Right now a matched word gives the value, 0. Otherwise, 2 is given but I'd like to expand that part to do what I described above.
for i in range(0, out.shape[0]): # from 0 to total number of rows out.shape[0] is rows - out.shape[1] is columns
for word in refArray: # for each word in the samplearray
#out.ix[i, str(word)] = out.index[i].count(str(word))
if out.index[i].count(str(word)) == 1:
out.ix[i, str(word)] = 0
else:
out.ix[i, str(word)] = 2