Matching 2 short descriptions and returning a confidence level

Question

I have some data that I get from the Banks using Yodlee and the corresponding transaction messages on the mobile. Both have some description in them - short descriptions.

For example -

string1 = "tatasky_TPSL MUMBA IND"
string2 = "tatasky_TPSL"

They can be matched if one is a completely inside the other. However, some strings like

string1 = "T.G.I Friday's"
string1 = "TGI Friday's MUMBA MAH"

Still need to be matched. Is there a y algorithm which gives a confidence level in matching 2 descriptions ?

https://docs.python.org/2/library/difflib.html#difflib.get_close_matches — Konstantin, May 15 '15 at 04:45
@Ajay - not necessarily.. There might be some fuzzy logic solutions... — Ninjinx, May 15 '15 at 04:49
@Ajay - they might not be in order - but not completely jumbled — Ninjinx, May 15 '15 at 05:00
See https://stackoverflow.com/questions/6690739/fuzzy-string-comparison-in-python-confused-with-which-library-to-use — rth, May 15 '15 at 08:31

score 1 · Answer 1 · answered May 16 '15 at 08:20

You might want to use Normalized edit distance also called levenstien distance levenstien distance wikipedia. So after getting levenstien distance between two strings, you can normalize it by dividing by the length of longest string (or average of those two strings). This normalised socre can act as confidense. You can find some 4-5 python packages of calculating levenstien distance. You can try it online as well edit distance calculator

Alternatively one simple solution is algorithm called longest common subsequence, which can be used here

Matching 2 short descriptions and returning a confidence level

1 Answers1