You need to determine the matching rules around your strings. What determines a 'similar string'
- number of matching characters
- number of non-matching characters
- similar length
- typos or phonetic errors
- business specific abbreviations
- must start with the same substring
- must end with the same substring
I've done quite a lot of work with string matching algorithms, and am yet to find any existing library or code that meets my specific requirements. Review them, borrow ideas from them, but you will invariably have to customize and write your own code.
The Levenstein algorithm is good but a bit slow. I've had some success with both Smith-Waterman & Jaro-Winkler algorithms, but the best I found for my purpose was Monge (from memory). However it pays to read the original research and determine why they've written their algorithms and their target dataset.
If you don't properly define what you want to match and measure then you'll find high scores on unexpected matches and low scores on expected matches. String matching is very domain specific. If you don't properly define your domain then you are like a fisherman without a clue, throwing hooks around and hoping for the best.