Highest Voted 'rapidfuzz' Questions

2

votes

1 answer

How to do effective matrix computation and not get memory overload for similarity scoring?

I have the following code for similarity scoring: from rapidfuzz import process, fuzz import pandas as pd d_test = { 'name' : ['South Beach', 'Dog', 'Bird', 'Ant', 'Big Dog', 'Beach', 'Dear', 'Cat'], 'cluster_number' : [1, 2, 3, 3, 2, 1, 4,…

asked Dec 13 '22 at 05:49

illuminato

1,057
1
11
33

2

votes

1 answer

How to set a column value by fuzzy string matching with another dataframe?

I have referred to this post but cannot get it to run for my particular case. I have two dataframes: import pandas as pd df1 = pd.DataFrame( { "ein": {0: 1001, 1: 1500, 2: 3000}, "ein_name": {0: "H for Humanity", 1: "Labor…

python pandas fuzzywuzzy rapidfuzz

asked Dec 24 '21 at 10:52

Umar Boodoo

69
6

2

votes

1 answer

Rapidfuzz match merge

Very new to this, would appreciate any advice on the following: I have a dataset 'Projects' showing list of institutions with project IDs: project_id institution_name 0 somali national university 1 aarhus university 2 …

python pandas rapidfuzz

asked Oct 14 '20 at 20:15

StrangeBadger

43
5

1

vote

1 answer

Apply Levenshtein distance from rapidfuzz.distance to dataframe with two columns

I have a csv file that looks as follows: ID; name1; name2 1; John Doe; John Does 2; Mike Johnson; Mike Jonson 3; Leon Mill; Leon Miller 4; Jack Jo; Jack Joe Now I want to calculate the Levenshtein distance for each pair of name. So compare "John…

python pandas levenshtein-distance rapidfuzz

asked Jul 11 '22 at 08:08

PSt

97
11

1

vote

1 answer

optimizing RapidFuzz for a list with large number of elements (e.g. 200,000)

I would like to run this piece of rapidfuzz code mentioned in this post on a list with 200,000 elements. I am wondering what's the best way to optimize this for a faster run on GPU? Find fuzzy match string in a list with matching string value and…

python python-3.x fuzzywuzzy rapidfuzz

asked Jun 25 '22 at 12:41

nerd

473
5
15

1

vote

1 answer

Fuzzy Matching with different fuzz ratios

I have two large datasets. df1 is about 1m lines, and df2 is about 10m lines. I need to find matches for lines in df1 from df2. I have posted an original version of this question separately. See here. Well answered by @laurent but I have some…

python pandas fuzzywuzzy rapidfuzz

asked Mar 03 '22 at 09:52

Umar Boodoo

69
6

1

vote

1 answer

Pandas fast fuzzy match

I have two data frames with the following format: d = {'id2': ['1', '2'], 'name': ['paris city', 'london town']} df1 = pd.DataFrame(data=d) print(df1) id2 name 0 1 paris city 1 1 london town d = {'id2':…

python pandas merge fuzzy-search rapidfuzz

asked Sep 13 '21 at 23:57

Mustard Tiger

3,520
8
43
68

1

vote

2 answers

Is there a way to modify this code to reduce run time?

so I am looking to modify this code to reduce runtime of fuzzywuzzy library. At present, it's taking about an hour for a dataset with 800 rows, and when I used this on a dataset with 4.5K rows, it kept running for almost 6 hours, still no result. I…

python data-cleaning fuzzywuzzy drop-duplicates rapidfuzz

asked Jul 22 '21 at 10:56

Shrumo

47
7

0

votes

0 answers

optimizing RapidFuzz for a large number of elements and obtaining match score

Following this answer I am also trying to obtain the string match score between two lists. What would be the best way of doing that? elements = pd.DataFrame({'name':['vikash', 'vikas', 'Vinod', 'Vikky', 'Akash', 'Vinodh', 'Sachin', 'Salman', 'Ajay',…

python python-3.x numpy fuzzywuzzy rapidfuzz

asked Mar 08 '23 at 16:39

RoyalPotatoe

13
2

0

votes

0 answers

How to do fuzzymatching on nested subsets of a dataframe?

I have a dataframe with columns: state, county, and agency_name, and I want to do fuzzy matching on the agency name to another dataframe that has more variables about agency names. But i want to only fuzzy match names within the same state and…

python dataframe fuzzy-search fuzzywuzzy rapidfuzz

asked Feb 09 '23 at 02:54

dave

31
1
2

0

votes

2 answers

How to make fuzzy search between lists showing matches and not found elements?

I'm trying to make a fuzzy match for the values in list to_search. Search each value in to_search within choices list and show the corresponding item from result list. Like a MS Excel VLookUp, but with fuzzy search. This is my current code that…

python fuzzy-search rapidfuzz

asked Oct 17 '22 at 07:18

Rasec Malkic

373
1
8

0

votes

1 answer

Is there a way to speed up matching addresses and level of confidence per match between two data frames for large datasets?

I have got a script below that check the accuracy of a column of addresses in my dataframe against a column of addresses in another dataframe, to see if they match and how well they match. I am using rapid fuzz I heard it is faster than fuzzywuzzy.…

python pandas rapidfuzz

asked Oct 06 '22 at 10:41

Kelly Tang

19
5

0

votes

1 answer

Using rapidfuzz on a dataframe

I have 4 columns which are BuisnessID, Name, BuisnessID_y, Name_y and I want to match Name with Name_y with a 90% similarity score, and if not 90% then drop those rows. Sample input df BusinessID NAME BusinessID_y NAME_y 1013120869 …

python pandas dataframe string-matching rapidfuzz

asked Dec 10 '21 at 09:20

Sarthak Gupta

7
1
4

0

votes

1 answer

Why is the token set ratio so low using fuzzywuzzy?

I am using fuzzywuzzy and rapidfuzz to find names mentioned in comments. I read through the documentation of the "token_set_ratio" function but I still don't understand the following: # I preprocessed the comments to remove stop words and commonly…

python token fuzzywuzzy rapidfuzz

asked Oct 08 '20 at 09:50

Michael Altorfer

21
5

0

votes

0 answers

Python: TypeError: can't pickle module objects multiprocessing on Jupyter Notebook

I am sorry that my code might look confusing, but what it does is that it reads in 300,000 items and try to cross-reference them to another file. (It tries to find the best match of the item descriptions from another file). I know that the library…

python python-3.x pandas jupyter-notebook rapidfuzz

asked Jun 01 '20 at 15:16

Student04

55
6

Questions tagged [rapidfuzz]