128

A Levenshtein implementation in C# and F#. The C# version is 10 times faster for two strings of about 1500 chars. C#: 69 ms, F# 867 ms. Why? As far as I can tell, they do the exact same thing? Doesn't matter if it is a Release or a Debug build.

EDIT: If anyone comes here looking specifically for the Edit Distance implementation, it is broken. Working code is here.

C#:

private static int min3(int a, int b, int c)
{
   return Math.Min(Math.Min(a, b), c);
}

public static int EditDistance(string m, string n)
{
   var d1 = new int[n.Length];
   for (int x = 0; x < d1.Length; x++) d1[x] = x;
   var d0 = new int[n.Length];
   for(int i = 1; i < m.Length; i++)
   {
      d0[0] = i;
      var ui = m[i];
      for (int j = 1; j < n.Length; j++ )
      {
         d0[j] = 1 + min3(d1[j], d0[j - 1], d1[j - 1] + (ui == n[j] ? -1 : 0));
      }
      Array.Copy(d0, d1, d1.Length);
   }
   return d0[n.Length - 1];
}

F#:

let min3(a, b, c) = min a (min b c)

let levenshtein (m:string) (n:string) =
   let d1 = Array.init n.Length id
   let d0 = Array.create n.Length 0
   for i=1 to m.Length-1 do
      d0.[0] <- i
      let ui = m.[i]
      for j=1 to n.Length-1 do
         d0.[j] <- 1 + min3(d1.[j], d0.[j-1], d1.[j-1] + if ui = n.[j] then -1 else 0)
      Array.blit d0 0 d1 0 n.Length
   d0.[n.Length-1]
Guy Coder
  • 24,501
  • 8
  • 71
  • 136
Robert Jeppesen
  • 7,837
  • 3
  • 35
  • 50

1 Answers1

205

The problem is that the min3 function is compiled as a generic function that uses generic comparison (I thought this uses just IComparable, but it is actually more complicated - it would use structural comparison for F# types and it's fairly complex logic).

> let min3(a, b, c) = min a (min b c);;
val min3 : 'a * 'a * 'a -> 'a when 'a : comparison

In the C# version, the function is not generic (it just takes int). You can improve the F# version by adding type annotations (to get the same thing as in C#):

let min3(a:int, b, c) = min a (min b c)

...or by making min3 as inline (in which case, it will be specialized to int when used):

let inline min3(a, b, c) = min a (min b c);;

For a random string str of length 300, I get the following numbers:

> levenshtein str ("foo" + str);;
Real: 00:00:03.938, CPU: 00:00:03.900, GC gen0: 275, gen1: 1, gen2: 0
val it : int = 3

> levenshtein_inlined str ("foo" + str);;
Real: 00:00:00.068, CPU: 00:00:00.078, GC gen0: 0, gen1: 0, gen2: 0
val it : int = 3
ildjarn
  • 62,044
  • 9
  • 127
  • 211
Tomas Petricek
  • 240,744
  • 19
  • 378
  • 553
  • 1
    Why doesn't F# compile min3 as a function that takes int? It already knows enough type information at compile time to do this. This is how it would work if min3 was a C++ template function, so I'm a bit puzzled as to why F# doesn't do this. – sashang May 24 '11 at 00:00
  • 42
    F# infers it to be as generic as possible, e.g. "for all types X that support comparison". `inline` works like a C++ template, which would specialize to `int` based on the call site. – Brian May 24 '11 at 00:08
  • 13
    C++ templates behave essentially as F#'s `inline`. The reason why the default behavior is different is because it builds on .Net generics that are handled by the runtime (and, arguably, aren't so great for writing generic numeric code). Using the C++ behavior in F# would, however, lead to code bloat, because F# uses generics a lot more. – Tomas Petricek May 24 '11 at 00:12
  • 4
    C++ template semantics can lead to code bloat even in C++, and the lack of a convenient way to switch to using a run-time mechanism to avoid that is a hassle at times. However, the fear of code bloat is normally irrational - generally, C++ templates work well. –  Apr 13 '12 at 01:40
  • @Steve314 : It's also generally easy to avoid by refactoring out all code that doesn't use a dependent type, so that code isn't duplicated for different instantiations. – ildjarn Apr 13 '12 at 17:39
  • @ildjarn - That's not so "generally" as you claim in my experience, though of course that's almost entirely pre-C++11 and the new style makes a difference. I've written plenty of template containers where the guts were type-unsafe code using placement new, explicit destructors, offsetof and pointer arithmetic. There are workarounds that could avoid this, but they would defeat the whole purpose of those containers. That said, I'm not so sure that those containers were ever really that useful now - architecture astronaut tendencies, should really use vectors more and sort on need etc etc. –  Apr 15 '12 at 03:20
  • On JVM, I'd expect `min3` to get inlined by the JIT, assuming `levenstein` gets called many times. I wonder if .NET would do the same here. It seems unhelpful to benchmark one function invocation in a JIT'ed language. – MWB Oct 15 '20 at 21:55