70

There are similar question, but not regarding C# libraries I can use in my source code.

Thank you all for your help.

I've already saw lucene, but I need something more easy to search for similar strings and without the overhead of the indexing part.

The answer I marked has got two very easy algorithms, and one uses LINQ too, so it's perfect.

Dan
  • 9,391
  • 5
  • 41
  • 73
Luca Molteni
  • 5,230
  • 5
  • 34
  • 42
  • 7
    Why this is off-topic escapes me. The OP is asking if there is a function in a library that SO supports in-depth. – 101010 Sep 12 '14 at 13:17

8 Answers8

35

Levenshtein distance implementation:

I have a .NET 1.1 project in which I use the latter. It's simplistic, but works perfectly for what I need. From what I remember it needed a bit of tweaking, but nothing that wasn't obvious.

brianpeiris
  • 10,735
  • 1
  • 31
  • 44
George Mauer
  • 117,483
  • 131
  • 382
  • 612
32

you can also look at the very impressive library titled Sam's String Metrics https://github.com/StefH/SimMetrics.Net . this includes a host of algorithms.

  • Hamming distance
  • Levenshtein distance
  • Needleman-Wunch distance or Sellers Algorithm
  • Smith-Waterman distance
  • Gotoh Distance or Smith-Waterman-Gotoh distance
  • Block distance or L1 distance or City block distance
  • Monge Elkan distance
  • Jaro distance metric
  • Jaro Winkler
  • SoundEx distance metric
  • Matching Coefficient
  • Dice’s Coefficient
  • Jaccard Similarity or Jaccard Coefficient or Tanimoto coefficient
  • Overlap Coefficient
  • Euclidean distance or L2 distance
  • Cosine similarity
  • Variational distance
  • Hellinger distance or Bhattacharyya distance
  • Information Radius (Jensen-Shannon divergence)
  • Harmonic Mean
  • Skew divergence
  • Confusion Probability
  • Tau
  • Fellegi and Sunters (SFS) metric
  • TFIDF or TF/IDF
  • FastA
  • BlastP
  • Maximal matches
  • q-gram
  • Ukkonen Algorithms
Tangurena
  • 2,121
  • 1
  • 22
  • 41
Zaffiro
  • 4,834
  • 5
  • 36
  • 47
  • 14
    The link in this answer is giving me a 403 error. You can use the [Wayback Machine](http://web.archive.org/web/http://staffwww.dcs.shef.ac.uk/people/sam.chapman@k-now.co.uk/stringmetrics.html) instead. – Paul Ruane Aug 04 '11 at 15:41
  • I believe the .NET version of the library mentioned above is [here](http://sourceforge.net/projects/simmetrics/files/). After I converted it to Visual Studio 2010, and updated NUnit references, it builds. It also passes 87 tests. – dalenewman Feb 08 '12 at 17:14
  • 1
    I found a .net library version of this library on [SimMetrics.Net on GitHub](https://github.com/StefH/SimMetrics.Net). The same as the suggestion from @dalenewman, just on github perhaps? – Spiralis Oct 10 '17 at 11:21
14

They are not my own invention, but they are my favorites and I've just blogged about them and published my own tweaked versions of Dice Coefficient, Levenshtein Distance, Longest Common Subsequence and Double Metaphone in a blog post called Four Functions for Finding Fuzzy String Matches in C# Extensions.

trailmax
  • 34,305
  • 22
  • 140
  • 234
Tyler Jensen
  • 811
  • 6
  • 12
2

Have you taken a look at Lucene.net? It is a port of the Java Lucene search engine API to the .Net platform. That library offers a lot of search functionality. I played around with it a year or so ago, so don't take my suggestion as based on tons of experience. I saw it in the book Windows Developer Power Tools and took it for a test drive. You might look through their API documentation to see if it offers something like the Fuzzy Search for which you are looking.

Jason Jackson
  • 17,016
  • 8
  • 49
  • 74
1

This code project paper has a string similarity function using the Levenshtein distance.

Ed Schwehm
  • 2,163
  • 4
  • 32
  • 55
1

There is the following Levenshtein Distance Algorithm which assigns a value to the similarity of two strings (well, the difference actually), that could be used to build upon: http://www.merriampark.com/ldcsharp.htm

josliber
  • 43,891
  • 12
  • 98
  • 133
benefactual
  • 7,079
  • 5
  • 23
  • 16
0

The Beagle Project for Linux is written in c# (mono) and is a google-desktop like search tool. It may have some code in there for these kind of string matching.

If I recall correctly, it uses the Lucene library for searching and retrieving data. Maybe that can be useful for your project too.

Isak Savo
  • 34,957
  • 11
  • 60
  • 92
0

I have used "Ternary Search Tree Dictionary in C#" (http://www.codeproject.com/KB/recipes/tst.aspx) to search for similar strings.

Regards, Patricio