8

I am trying to create or find a CoffeeScript implementation of the Levenshtein Distance formula, aka Edit Distance. Here is what I have so far, any help at all would be much appreciated.

levenshtein = (s1,s2) ->
    n = s1.length
    m = s2.length
    if n < m
        return levenshtein(s2, s1) 
    if not s1 
        return s2.length
    previous_row = [s2.length + 1]
    for c1, i in s1
        current_row = [i + 1]
        for c2, j in s2
            insertions = previous_row[j + 1] + 1
            deletions = current_row[j] + 1
            substitutions = previous_row[j] # is this unnescessary?-> (c1 != c2)
            current_row.push(Math.min(insertions,deletions,substitutions))
        previous_row = current_row
    return previous_row[previous_row.length-1]
#End Levenshetein Function

Btw: I know this code is wrong on many levels, I am happy to receive any and all constructive criticism. Just looking to improve, and figure out this formula!

CodeEdit1: Patched up the errors Trevor pointed out, current code above includes those changes

Update: The question I am asking is - how do we do Levenshtein in CoffeeScript?

Here is the 'steps' for the Levenshtein Distance Algorithm to help you see what I am trying to accomplish.

Steps 1
Set n to be the length of s. Set m to be the length of t. If n = 0, return m and exit. If m = 0, return n and exit. Construct a matrix containing 0..m rows and 0..n columns.

2
Initialize the first row to 0..n. Initialize the first column to 0..m.

3 Examine each character of s (i from 1 to n).

4 Examine each character of t (j from 1 to m).

5 If s[i] equals t[j], the cost is 0. If s[i] doesn't equal t[j], the cost is 1.

6 Set cell d[i,j] of the matrix equal to the minimum of: a. The cell immediately above plus 1: d[i-1,j] + 1. b. The cell immediately to the left plus 1: d[i,j-1] + 1. c. The cell diagonally above and to the left plus the cost: d[i-1,j-1] + cost.

7 After the iteration steps (3, 4, 5, 6) are complete, the distance is found in cell d[n,m].

source:http://www.merriampark.com/ld.htm

joe norton
  • 127
  • 7
  • How do we do Levenshtein Distance in CoffeeScript? – joe norton Jul 10 '11 at 00:02
  • Three big problems with your current code: 1) `previouis_row` is a typo, and 2) `previous_row[-1]` won't work because CoffeeScript doesn't use negative indices (use `previous_row[previous_row.length-1]`), and 3) `+ (c1 != c2)` adds either `true` or `false`. I'll post a fuller answer, but you should correct those three things right away. – Trevor Burnham Jul 10 '11 at 00:24
  • Also, CoffeeScript's `for...in` syntax goes `value, index`, not the other way around. So your loops should be `for c1, i in s1` and `for c2, j in s2`. – Trevor Burnham Jul 10 '11 at 00:27
  • Thanks so much Trevor. I'll have you know I look forward to getting a hard copy of your upcoming book (no rush!). – joe norton Jul 10 '11 at 00:28

1 Answers1

5

This page (linked to from the resource you mentioned) offers a JavaScript implementation of the Levenshtein distance algorithm. Based on both that and the code you posted, here's my CoffeeScript version:

LD = (s, t) ->
  n = s.length
  m = t.length
  return m if n is 0
  return n if m is 0

  d       = []
  d[i]    = [] for i in [0..n]
  d[i][0] = i  for i in [0..n]
  d[0][j] = j  for j in [0..m]

  for c1, i in s
    for c2, j in t
      cost = if c1 is c2 then 0 else 1
      d[i+1][j+1] = Math.min d[i][j+1]+1, d[i+1][j]+1, d[i][j] + cost

  d[n][m]

It seems to hold up to light testing, but let me know if there are any problems.

Trevor Burnham
  • 76,828
  • 33
  • 160
  • 196
  • Trevor this solution works great. I tested it against JavaScript implementations and it gets same results. I have one question: Why do we use the variables c1 and c2? It looks to me like it should use variables n and m but it doesn't work this way. Why is this? – joe norton Jul 11 '11 at 04:05
  • @joe `c1` and `c2` are actual characters from the string. If they differ, then `cost` is `1`; otherwise, `cost` is `0`. – Trevor Burnham Jul 11 '11 at 13:21
  • Not CS but callable from it: http://stackoverflow.com/questions/11919065/sort-an-array-by-the-levenshtein-distance-with-best-performance-in-javascript/11958496#11958496 – James Westgate Nov 30 '16 at 11:03