I know this is old question in fact i have seen many links related to my question:
Using fuzzywuzzy to create a column of matched results in the data frame
How to compare a value in one dataframe to a column in another using fuzzywuzzy ratio
But i didnt get any proper solution for this
below is my code:
g = [{'column1': 'ryzen 5 5600'},{'column1':'ram 8 gb ddr4 3.2ghz'}, {'column2':'SSD
220gb'}, {'column3':'windows 10 prof'},
{'column2':'ryzen 5 3600'}, {'column1':'ram 16 gb ddr4'}]
df1=pd.read_excel('product1.xlsx', header=None, index_col=False)
s = []
for l in df1.values:
l = ', '.join(l)
s.append(l)
s = ', '.join(s)
MIN_MATCH_SCORE = 30
guessed_word = [d for d in g if fuzz.token_set_ratio(s, list(d.values())[0]) >= 30]
product1 contains:
0 GB ddr4
1 HDD 256GB
2 SSD
3 ryzen 5
4 Win 10 Pro
guessed_word contains:
#gives good output
[{'column1': 'ryzen 5 5600'},
{'column1': 'ram 8 gb ddr4 3.2ghz'},
{'column2': 'SSD 220gb'},
{'column3': 'windows 10 prof'},
{'column2': 'ryzen 5 3600'},
{'column1': 'ram 16 gb ddr4'}]
After appending to dataframe:
df3 = pd.Dataframe(guessed_word)
df3 contains:
column1 column2 column3
ryzen 5 5600 SSD 220gb windows 10 prof
ram 8 gb ddr4 3.2ghz ryzen 5 3600
ram 16 gb ddr4
But i want following output:
#product1 column1 column2 column3
0 GB ddr4 ram 8 gb ddr4 3.2ghz, ram 16 gb ddr4 NAN NAN
1 HDD 256GB NAN NAN NAN
2 SSD NAN SSD 220gb NAN
3 ryzen 5 ryzen 5 5600 ryzen 5 3600 NAN
4 Win 10 Pro NAN NAN windows 10 prof
is it possible to sort with df.sort_values or anything ? i tried and none of that are working.