0

I have some data that I get from the Banks using Yodlee and the corresponding transaction messages on the mobile. Both have some description in them - short descriptions.

For example -

string1 = "tatasky_TPSL MUMBA IND"
string2 = "tatasky_TPSL"

They can be matched if one is a completely inside the other. However, some strings like

string1 = "T.G.I Friday's"
string1 = "TGI Friday's MUMBA MAH" 

Still need to be matched. Is there a y algorithm which gives a confidence level in matching 2 descriptions ?

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Ninjinx
  • 625
  • 2
  • 7
  • 13

1 Answers1

1

You might want to use Normalized edit distance also called levenstien distance levenstien distance wikipedia. So after getting levenstien distance between two strings, you can normalize it by dividing by the length of longest string (or average of those two strings). This normalised socre can act as confidense. You can find some 4-5 python packages of calculating levenstien distance. You can try it online as well edit distance calculator

Alternatively one simple solution is algorithm called longest common subsequence, which can be used here

Alok Nayak
  • 2,381
  • 22
  • 28