Questions tagged [fuzzywuzzy]

FuzzyWuzzy is a Python package to perform fuzzy string matching.

FuzzyWuzzy is a Python package to perform fuzzy string matching.

Useful links

522 questions
81
votes
2 answers

When to use which fuzz function to compare 2 strings

I am learning fuzzywuzzy in Python. I understand the concept of fuzz.ratio, fuzz.partial_ratio, fuzz.token_sort_ratio and fuzz.token_set_ratio. My question is when to use which function? Do I check the 2 strings' length first, say if not similar,…
Pot
  • 823
  • 1
  • 8
  • 8
54
votes
1 answer

What does "the following packages will be superseded by a higher priority channel" mean?

I am trying to install fuzzywuzzy onto my Anaconda distribution in 64 bit Linux. When I do this, it tries to change my conda, and conda-env to conda-forge channels. As follows: I search anaconda for fuzzy wuzzy by writing: anaconda search -t…
Chuck
  • 3,664
  • 7
  • 42
  • 76
29
votes
3 answers

how to parallelize many (fuzzy) string comparisons using apply in Pandas?

I have the following problem I have a dataframe master that contains sentences, such as master Out[8]: original 0 this is a nice sentence 1 this is another one 2 stackoverflow is nice For every row in Master, I lookup…
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
18
votes
4 answers

Getting error while using fuzzywuzzy: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning

I am getting below error. Is there any way to fix it without installing python-Levenshtein and if not then how to install python-Levenshtein on linux. UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this…
LOrD_ARaGOrN
  • 3,884
  • 3
  • 27
  • 49
18
votes
3 answers

Fuzzy string matching in Python

I have 2 lists of over a million names with slightly different naming conventions. The goal here it to match those records that are similar, with the logic of 95% confidence. I am made aware there are libraries which I can leverage on, such as the…
BernardL
  • 5,162
  • 7
  • 28
  • 47
17
votes
4 answers

Vectorizing or Speeding up Fuzzywuzzy String Matching on PANDAS Column

I am trying to look for potential matches in a PANDAS column full of organization names. I am currently using iterrows() but it is extremely slow on a dataframe with ~70,000 rows. After having looked through StackOverflow I have tried implementing a…
Gregory Saxton
  • 1,241
  • 4
  • 13
  • 29
14
votes
1 answer

fuzzy matching in R

I am trying to detect matches between an open text field (read: messy!) with a vector of names. I created a silly fruit example that highlights my main challenges. df1 <- data.frame(id = c(1, 2, 3, 4, 5, 6), entry = c("Apple", …
Eric Green
  • 7,385
  • 11
  • 56
  • 102
14
votes
3 answers

python fuzzywuzzy's process.extract(): how does it work?

I am trying to understand how the python module fuzzywuzzy's function process.extract() work? I mainly read about the fuzzywuzzy package here: http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/, which is a great post…
alwaysaskingquestions
  • 1,595
  • 5
  • 22
  • 49
12
votes
1 answer

Apply fuzzy matching across a dataframe column and save results in a new column

I have two data frames with each having a different number of rows. Below is a couple rows from each data set df1 = Company City State ZIP FREDDIE LEES AMERICAN GOURMET SAUCE St. Louis MO…
Jstuff
  • 1,266
  • 2
  • 16
  • 27
11
votes
3 answers

Python Fuzzy Matching (FuzzyWuzzy) - Keep only Best Match

I'm trying to fuzzy match two csv files, each containing one column of names, that are similar but not the same. My code so far is as follows: import pandas as pd from pandas import DataFrame from fuzzywuzzy import process import csv save_file =…
Kvothe
  • 1,341
  • 7
  • 20
  • 33
10
votes
3 answers

FuzzyWuzzy error: WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '/']

Trying to write a code that will compare multiple files and return the highest fuzzratio between multiple options. Problem is I'm getting an error message: WARNING:root:Applied processor reduces input query to empty string, all comparisons will have…
Hofbr
  • 868
  • 9
  • 31
9
votes
1 answer

Compare each row with all rows in data frame and save results in list for each row

I try to compare each row with all rows in a pandas dataframe with fuzzywuzzy.fuzzy.partial_ratio() >= 85 and write the results in a list for each row. Example: df = pd.DataFrame({'id': [1, 2, 3, 4, 5, 6], 'name': ['dog', 'cat', 'mad cat', 'good…
pirr
  • 445
  • 1
  • 8
  • 15
8
votes
0 answers

How to speed up Fuzzy Matching using Fuzzywuzzy in Python

I am using Fuzzywuzzy in Python to match people names in 2 lists. However, the runtime is too long as one list contains 25000 names and another contains 39000 names. It has been running for 20 hrs now. Previously, I used the same code to match 2…
MMAASS
  • 433
  • 4
  • 18
8
votes
2 answers

Better Approach than FuzzyWuzzy?

I'm getting a result in fuzzywuzzy that isn't working as well as hoped. If there is an extra word in the middle, due to the levenshtein difference, the score is lower. Example: from fuzzywuzzy import fuzz score = fuzz.ratio('DANIEL CARTWRIGHT',…
Caitlin G
  • 105
  • 1
  • 6
8
votes
5 answers

no module named fuzzywuzzy

I installed fuzzywuzzy with pip for python3. When I do pip list I see fuzzywuzzy (0.8.1) However when I try to import is I get an error. Python 3.4.0 (default, Jun 19 2015, 14:20:21) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or…
user3605780
  • 6,542
  • 13
  • 42
  • 67
1
2 3
34 35