1

Question:

I am trying to use semantic matching in Python on a group of words.

Sample Input:

['error 1', '14_7error', 'err_87P', 'configuration 49-ñ', 'confi:p2g%']

Sample Output:

['error 1,14_7error,err_87P', 'configuration 49-ñ','confi:p2g%']

What I have tried:

I have tried using sklearn, but can get it to work, code:

from sklearn.feature_extraction.text import TfidfVectorizer

documents = ['error 1', '14_7error', 'err_87P', 'configuration 49-ñ', 'confi:p2g%']

tfidf = TfidfVectorizer().fit_transform(documents)
pairwise_similarity = (tfidf * tfidf.T).toarray()

I have also looked at:

But none of it has helped much.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
abhigyanj
  • 2,355
  • 2
  • 9
  • 29

1 Answers1

2

Looking at your input data, it seems that your goal is not semantic matching, but string matching. You can use fuzzywuzzy to do that:

from fuzzywuzzy import process 
documents = ['error 1', '14_7error', 'err_87P', 'configuration 49-ñ', 'confi:p2g%']
results = [[i, process.extractOne(i, [x for x in documents if x != i])] for i in documents]

Results for best matches with matching score: [['error 1', ('14_7error', 71)], ['14_7error', ('error 1', 71)], ['err_87P', ('error 1', 43)], ['configuration 49-ñ', ('confi:p2g%', 60)], ['confi:p2g%', ('configuration 49-ñ', 60)]]

RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26