Let's say I have a Pandas dataframe in Python that looks something like this:
df_test = pd.DataFrame(data=None, columns=['file', 'number'])
df_test.file = ['washington_142', 'washington_287', 'chicago_453', 'chicago_221', 'chicago_345', 'seattle_976', 'seattle_977', 'boston_367', 'boston 098']
df_test.number = [20, 21, 33, 34, 33, 45, 45, 52, 52]
What I want to find out from this dataset are those strings in 'file' that start with the same exact letters (maybe 50% of the string at least), but that do not have the same corresponding value in the 'number' column. In this example, it means I would want to create a new dataframe that finds:
'washington_142', 'washington_287', 'chicago_453', 'chicago_221', 'chicago_345'
But none of the others since they have the same 'number' when the spelling starts with the same string. I know there is a function 'difflib.get_close_matches' but I am not sure how to implement it to check with the other column in the dataframe. Any advice or help is really appreciated!