I have a similar problem to the links provided in the following references with minor differences but want the same results:
- Apply fuzzy matching across a dataframe column and save results in a new column
- Fuzzy match strings in one column and create new dataframe using fuzzywuzzy
I have on dataframe and want to get the partial ratio and token between 2 columns within the dataframe. Column 1 is just one word per row, but column 2 is a list of words with each row varying in size(I changed it to a tuple to make the functions in the references work).
The main issue I get is that in the compare it goes through column 1 and compares each element to every element in column 2 thus creating a massive dataframe when I just want it 1 to 1. How can I fix this?
df = pd.DataFrame(
{
"id": [1, 2, 3, 4, 5, 6],
"fruits": ["apple", "apples", "orange", "apple tree", "oranges", "mango"],
"choices": [
("app", "apull", "apple"),
("app", "apull", "apple", "appple"),
("orange", "org"),
("apple"),
("oranges", "orang"),
("mango"),
],
}
)
id fruits choices
0 1 apple ('app', 'apull', 'apple')
1 2 apples ('app', 'apull', 'apple', 'appple')
2 3 orange ('orange', 'org')
3 4 apple tree ('apple')
4 5 oranges ('oranges', 'orang')
5 6 mango ('mango')
What compare gives me in the variable explorer:
compare = pd.MultiIndex.from_product([df['fruits'], df['choices']]).to_series()
fruits choices
0 apple ('app', 'apull', 'apple')
1 apple ('app', 'apull', 'apple', 'appple')
2 apple ('orange', 'org')
3 apple ('apple')
4 apple ('oranges', 'orang')
5 apple ('mango')
6 apples ('app', 'apull', 'apple')
7 apples ('app', 'apull', 'apple', 'appple')
8 apples ('orange', 'org')
...
Is it possible to get the desired output like the first output in reference 1 but the multi-indexed elements as the choices?
Expected output like in reference #1, but I want the choices multi-indexed: