I have some ugly strings similar to these:
string1 = 'Fantini, Rauch, C.Straus, Priuli, Bertali: 'Festival Mass at the Imperial Court of Vienna, 1648' (Yorkshire Bach Choir & Baroque Soloists + Baroque Brass of London/Seymour)'
string2 = 'Vinci, Leonardo {c.1690-1730}: Arias from Semiramide Riconosciuta, Didone Abbandonata, La Caduta dei Decemviri, Lo Cecato Fauzo, La Festa de Bacco, Catone in Utica. (Maria Angeles Peters sop. w.M.Carraro conducting)'
I would like a library or algorithm that will give me a percentage of how many words they have in common, while excluding special characters such as ','
and ':'
and '''
and '{'
etc.
I know of the Levenshtein algorithm. However, this compares numbers of similar CHARACTERS, whereas I would like to compare how many WORDS they have in common