Questions tagged [sequencematcher]

For questions pertaining to SequenceMatcher from the python difflib module. This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. difflib is part of the python standard library.

Documentation

72 questions
27
votes
3 answers

How does Pythons SequenceMatcher work?

I am a little puzzled by two different answers returned by SequenceMatcher depending on the order of the arguments. Why is it so? Example SequenceMatcher is not commutative: >>> from difflib import SequenceMatcher >>> SequenceMatcher(None, "Ebojfm…
user2399453
  • 2,930
  • 5
  • 33
  • 60
18
votes
4 answers

Getting error while using fuzzywuzzy: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning

I am getting below error. Is there any way to fix it without installing python-Levenshtein and if not then how to install python-Levenshtein on linux. UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this…
LOrD_ARaGOrN
  • 3,884
  • 3
  • 27
  • 49
7
votes
2 answers

making difflib's SequenceMatcher ignore "junk" characters

I have a lot of strings that i want to match for similarity(each string is 30 characters on average). I found difflib's SequenceMatcher great for this task as it was simple and found the results good. But if i compare hellboy and hell-boy like…
lovesh
  • 5,235
  • 9
  • 62
  • 93
7
votes
2 answers

difflib.SequenceMatcher isjunk argument not considered?

In the python difflib library, is the SequenceMatcher class behaving unexpectedly, or am I misreading what the supposed behavior is? Why does the isjunk argument seem to not make any difference in this case? difflib.SequenceMatcher(None, "AA", "A…
bluelogic
  • 71
  • 3
7
votes
1 answer

Difflib's SequenceMatcher - Customized equality

I've been trying to create a nested or recursive effect with SequenceMatcher. The final goal is comparing two sequences, both may contain instances of different types. For example, the sequences could be: l1 = [1, "Foo", "Bar", 3] l2 = [1, "Fo",…
YaronK
  • 782
  • 1
  • 7
  • 14
6
votes
2 answers

SequenceMatcher - finding the two most similar elements of two or more lists of data

I was trying to compare a set of strings to an already defined set of strings. For example, you want to find the addressee of a letter, which text is digitalized via OCR. There is an array of adresses, which has dictionaries as elements. Each…
valerius21
  • 423
  • 5
  • 14
4
votes
1 answer

How does Python 3.6 SequenceMatcher().get_matching_blocks() work?

I am trying to use SequenceMatcher.ratio() to get the similarity of two strings: "86418648" and "86488648": >>> SequenceMatcher(None,"86418648","86488648").ratio() 0.5 The ratio returned is 0.5, which is much lower than I expected because there is…
Jessie
  • 41
  • 1
  • 5
4
votes
5 answers

Comparing two columns of a csv and outputting string similarity ratio in another csv

I am very new to python programming. I am trying to take a csv file that has two columns of string values and want to compare the similarity ratio of the string between both columns. Then I want to take the values and output the ratio in another…
Jimmy
  • 43
  • 1
  • 5
4
votes
1 answer

SequenceMatcher for multiple inputs, not just two?

wondering about the best way to approach this particular problem and if any libraries (python preferably, but I can be flexible if need be). I have a file with a string on each line. I would like to find the longest common patterns and their…
Peck
  • 822
  • 1
  • 9
  • 26
4
votes
3 answers

Determine where documents differ with Python

I have been using the Python difflib library to find where 2 documents differ. The Differ().compare() method does this, but it is very slow - atleast 100x slower for large HTML documents compared to the diff command. How can I efficiently determine…
hoju
  • 28,392
  • 37
  • 134
  • 178
3
votes
1 answer

Python: Passing SequenceMatcher in difflib an "autojunk=False" flag yields error

I am trying to use the SequenceMatcher method in Python's difflib package to identify string similarity. I have experienced strange behavior with the method, though, and I believe my problem may be related to the package's "junk" filter, a problem…
duhaime
  • 25,611
  • 17
  • 169
  • 224
2
votes
3 answers

How to find the longest common substring in a list of strings (>2 strings)? Trying FuzzyWuzzy and Sequence matcher

So I am trying to find a common identifier for journals using dois. For example, I have a list of dois for a…
msci
  • 29
  • 2
2
votes
1 answer

Replacing similar strings in the column by using the same for both

I'm encountering the following issue during a small project of mine. I'm having a large dataset where some string values are accidentally not written properly. My goal is to write a function that ensures that all names that look fairly similar (.75)…
DataDude
  • 136
  • 7
2
votes
4 answers

By how much percentage do the two strings match?

I have 2 columns of disease names, I have to try and match the best options. I tried using "SequenceMatcher" module and "fuzzywuzzy" module in python and the results were surprising. I have pasted the results and my doubts below: Consider there is a…
2
votes
1 answer

How to compare each array in a set of binary arrays to an array that is outside the set

I have a set of arrays. I also have a separate array (T) to compare each array in the set to. I've tried to use SequenceMatcher to do this but can't figure out how to loop it so that each array from the set gets compared to T. This is for a fitness…
badam
  • 55
  • 4
1
2 3 4 5