String search by coincidence?

Question

I just wanted to know if there's a simple way to search a string by coincidence with another one in Python. Or if anyone knows how it could be done.

To make myself clear I'll do an example.

text_sample = "baguette is a french word"
words_to_match = ("baguete","wrd")

letters_to_match = ('b','a','g','u','t','e','w','r','d')   #   With just one 'e'
coincidences = sum(text_sample.count(x) for x in letters_to_match)

#    coincidences = 14 Current output
#    coincidences = 10 Expected output

My current method breaks the words_to_match into single characters as in letters_to_match but then it is matched as follows: "baguette is a french word" (coincidences = 14).

But I want to obtain (coincidences = 10) where "baguette is a french word" were counted as coincidences. By checking the similarity between words_to_match and the words in text_sample.

How do I get my expected output?

so you only want the count to include the first occurence of each character? But in your output "e" is the only character that's counted twice. I don't get the logic here — Shubham Periwal, Jun 20 '21 at 10:19
No, if text_sample was "a baguette is a french word" that first 'a' would be matched as the first occurrence and that's not what I want. I want it done by checking the similarity between words_to_match and the words in the text_sample. — Pomodor0, Jun 20 '21 at 10:27
That sounds very wage to me as well. Is it something in the direction of [edit distance](https://en.wikipedia.org/wiki/Edit_distance) that you are out after? — Dr. V, Jun 20 '21 at 10:41
Exactly like edit distance, is there a way to do it on python? — Pomodor0, Jun 20 '21 at 10:53
I'm sure you can find a Python implement of a function that calculates the Levenshtein distance or one of the other measurement techniques somewhere (or implement one of them yourself). — martineau, Jun 20 '21 at 11:28
@Pomodor0 You might also want to take a look at [difflib](https://docs.python.org/3/library/difflib.html) — MegaIng, Jun 20 '21 at 11:56

score 1 · Answer 1 · answered Jun 20 '21 at 18:07

first, split words_to_match with

    words = ''
    for item in words_to_match:
        words += item
    letters = [] # create a list
    for letter in words:
        letters.append(letter)
    letters = tuple(letters)

then, see if its in it

    x = 0
    for i in sample_text:
        if letters[x] == i:
            x += 1
            coincidence += 1

also if it's not in sequence just do:

    for i in sample_text:
        if i in letters: coincidence += 1

(note that some versions of python you'l need a newline)

pts · Answer 2 · 2021-06-20T12:13:56.900

It looks like you need the length of the longest common subsequence (LCS). See the algorithm in the Wikipedia article for computing it. You may also be able to find a C extension which computes it quickly. For example, this search has many results, including pylcs. After installation (pip install pylcs):

import pylcs
text_sample = "baguette is a french word"
words_to_match = ("baguete","wrd")
print(pylcs.lcs2(text_sample, ' '.join(words_to_match.join)))  #: 14

String search by coincidence?

2 Answers2