I've implemented a basic search for a research project. I'm trying to make the search more efficient by building a suffix tree. I'm interested in a C# implementation of the Ukkonen algorith. I don't want to waste time rolling my own if such implementation exists.
-
Can you elaborate on your question at all? – Matt Hanson Oct 11 '08 at 00:09
-
I am trying to implement a search within a research project. I've implemented the reverse index and incremental population of the index. Next I was looking to make the search even more efficient but did not want to roll my own ST implementation if one exists. – Goran Oct 11 '08 at 07:09
3 Answers
Hard question. Here's the closest to match I could find: http://www.codeproject.com/KB/recipes/ahocorasick.aspx, which is an implementation of the Aho-Corasick string matching algorithm. Now, the algorithm uses a suffix-tree-like structure per: http://en.wikipedia.org/wiki/Aho-Corasick_algorithm
Now, if you want a prefix tree, this article claims to have an implementation for you: http://www.codeproject.com/KB/recipes/prefixtree.aspx
<HUMOR> Now that I did your homework, how about you mow my lawn. (Reference: http://flyingmoose.org/tolksarc/homework.htm) </HUMOR>
Edit: I found a C# suffix tree implementation that was a port of a C++ one posted on a blog: http://code.google.com/p/csharsuffixtree/source/browse/#svn/trunk/suffixtree
Edit: There is a new project at Codeplex that is focused on suffix trees: http://suffixtree.codeplex.com/

- 13,085
- 9
- 62
- 89
Hei, just finished implementing .NET (c#) library containing different trie implementations. Among them:
- Classical trie
- Patricia trie
- Suffix trie
- A trie using Ukkonen's algorithm
I tried to make source code easy readable. Usage is also very straight forward:
using Gma.DataStructures.StringSearch;
...
var trie = new UkkonenTrie<int>(3);
//var trie = new SuffixTrie<int>(3);
trie.Add("hello", 1);
trie.Add("world", 2);
trie.Add("hell", 3);
var result = trie.Retrieve("hel");
The library is well tested and also published as TrieNet NuGet package.

- 7,593
- 2
- 36
- 52
-
Thank you for the Ukkonen's algorithm implementation. Great work. – Tomer Pintel Dec 14 '20 at 08:53
Here is an implementation of a suffix tree that is reasonably efficient. I haven't studied Ukkonen's implementation, but the running time of this algorithm I believe is quite reasonable, approximately O(N Log N)
. Note the number of internal nodes in the tree created is equal to the number of letters in the parent string.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using NUnit.Framework;
namespace FunStuff
{
public class SuffixTree
{
public class Node
{
public int Index = -1;
public Dictionary<char, Node> Children = new Dictionary<char, Node>();
}
public Node Root = new Node();
public String Text;
public void InsertSuffix(string s, int from)
{
var cur = Root;
for (int i = from; i < s.Length; ++i)
{
var c = s[i];
if (!cur.Children.ContainsKey(c))
{
var n = new Node() {Index = from};
cur.Children.Add(c, n);
// Very slow assertion.
Debug.Assert(Find(s.Substring(from)).Any());
return;
}
cur = cur.Children[c];
}
Debug.Assert(false, "It should never be possible to arrive at this case");
throw new Exception("Suffix tree corruption");
}
private static IEnumerable<Node> VisitTree(Node n)
{
foreach (var n1 in n.Children.Values)
foreach (var n2 in VisitTree(n1))
yield return n2;
yield return n;
}
public IEnumerable<int> Find(string s)
{
var n = FindNode(s);
if (n == null) yield break;
foreach (var n2 in VisitTree(n))
yield return n2.Index;
}
private Node FindNode(string s)
{
var cur = Root;
for (int i = 0; i < s.Length; ++i)
{
var c = s[i];
if (!cur.Children.ContainsKey(c))
{
// We are at a leaf-node.
// What we do here is check to see if the rest of the string is at this location.
for (var j=i; j < s.Length; ++j)
if (cur.Index + j >= Text.Length || Text[cur.Index + j] != s[j])
return null;
return cur;
}
cur = cur.Children[c];
}
return cur;
}
public SuffixTree(string s)
{
Text = s;
for (var i = s.Length - 1; i >= 0; --i)
InsertSuffix(s, i);
Debug.Assert(VisitTree(Root).Count() - 1 == s.Length);
}
}
[TestFixture]
public class TestSuffixTree
{
[Test]
public void TestBasics()
{
var s = "banana";
var t = new SuffixTree(s);
var results = t.Find("an").ToArray();
Assert.AreEqual(2, results.Length);
Assert.AreEqual(1, results[0]);
Assert.AreEqual(3, results[1]);
}
}
}

- 17,602
- 7
- 105
- 102
-
-@cdiggins, sorry for my ignorance. It is my first time to see a class within another class. In your code,`public class Node` is within `public class SuffixTree`, what is the skill herein? – Jul 01 '14 at 15:56
-
1I know this post is old but there seems to be an issue with this code under some circuimstances that I cant seem to figure. Set S = "TAGGAAATTC" and try t.Find("CTATT") and the line f (Text[cur.Index + j] != s[j]) will result in IndexOutsideBoundsOfArray exception. Your implementation seems to be very elegant and efficient enough (typing out Ukkoken is a bit.. difficult). I can't seem to find a way around this bug. – Ilhan Aug 15 '18 at 21:03
-
1
-
Search doesn't work properly in this implementation for cases, when there are 2 matches. – Anton Jun 19 '20 at 11:22