1

Upon working with long strings now, I came across a rather big problem in creating suffix trees in Haskell.

Some constructing algorithms (as this version of Ukkonen's algorithm) require establishing links between nodes. These links "point" on a node in the tree. In imperative languages, such as Java, C#, etc. this is no problem because of reference types.

Are there ways of emulating this behaviour in Haskell? Or is there a completely different alternative?

M4N
  • 94,805
  • 45
  • 217
  • 260
ThreeFx
  • 7,250
  • 1
  • 27
  • 51
  • When all else fails, you can write your stateful algorithm with `ST` and `STRef`s. You can then `runST` to get your pure suffix tree at the end. – chi Dec 27 '14 at 17:25
  • Why Ukkonen's algorithm is so popular recently in Haskell tag, There are another question, without a proper answer though - only hints in comments: http://stackoverflow.com/questions/19370296/haskell-data-type-with-references – phadej Dec 27 '14 at 17:49
  • Do you need just directed acyclic graphs? You can use graphs from [this](https://hackage.haskell.org/package/fgl) package. – effectfully Dec 27 '14 at 17:51
  • One standard trick to construct self-referential structures is called "tying the knot." You may find the answers to this question helpful: [How do you represent a graph in Haskell?](http://stackoverflow.com/questions/9732084/how-do-you-represent-a-graph-in-haskell). – Christian Conkle Dec 27 '14 at 17:52
  • Oh, and there's an [unanswered question on a similar topic](http://stackoverflow.com/questions/19370296/haskell-data-type-with-references) that has a few comments you may find helpful. – Christian Conkle Dec 27 '14 at 17:55

1 Answers1

2

You can use a value that isn't determined until the result of a computation in the construction of data in the computation by tying a recursive knot.

The following computation builds a list of values that each hold the total number of items in the list even though the total is computed by the same function that's building the list. The let binding in zipCount passes one of the results of zipWithAndCount as the first argument to zipWithAndCount.

zipCount :: [a] -> [(a, Int)]
zipCount xs = 
    let (count, zipped) = zipWithAndCount count xs
    in zipped

zipWithAndCount :: Num n => b -> [a] -> (n, [(a, b)])
zipWithAndCount y [] = (0, [])
zipWithAndCount y (x:xs) =
    let (count', zipped') = zipWithAndCount y xs
    in (count' + 1, (x, y):zipped')

Running this example makes a list where each item holds the count of the total items in the list

> zipCount ['a'..'e']
[('a',5),('b',5),('c',5),('d',5),('e',5)]

This idea can be applied to Ukkonen's algorithm by passing in the #s that aren't known until the entire result is known.

The general idea of recursively passing a result into a function is called a least fixed point, and is implemented in Data.Function by

fix :: (a -> a) -> a
fix f = let x = f x in x

We can write zipCount in points-free style in terms of zipWithAndCount and fix.

import Data.Function

zipCount :: [a] -> [(a, Int)]
zipCount = snd . fix . (. fst) . flip zipWithAndCount
Community
  • 1
  • 1
Cirdec
  • 24,019
  • 2
  • 50
  • 100