I'm working on twitter hashtags and I've already counted the number of times they appear in my csv file. My csv file look like:
GilletsJaunes, 100
Macron, 50
gilletsjaune, 20
tax, 10
Now, I would like to group together 2 terms that are close, such as "GilletsJaunes" and "gilletsjaune" using the fuzzywuzzy library. If the proximity between the 2 terms is greater than 80, then their value is added in only one of the 2 terms and the other is deleted. This would give:
GilletsJaunes, 120
Macron, 50
tax, 10
For use "fuzzywuzzy":
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
fuzz.ratio("GiletsJaunes", "giletsjaune")
82 #output