3

I need some help to understand the arguments of these functions. I took the example from the help.

## To see the transformation counts for the Levenshtein distance:
drop(attr(adist("kitten", "sitting", counts = TRUE), "counts"))
# ins del sub 
#   1   0   2 

ins, stands for insertions; del for deletions; and sub for substitutions.

## To see the transformation sequences:
attr(adist(c("kitten", "sitting"), counts = TRUE), "trafos")
#      [,1]      [,2]     
# [1,] "MMMMMM"  "SMMMSMI"
# [2,] "SMMMSMD" "MMMMMMM"

From this is easy to see that while comparing the string one visa vie the string two, it finds SMMMSMI; 2 substitutions and 1 insertion, in total the distance should be three.

adist("kitten", "sitting", costs = list(ins=1, del=0, sub=1), partial = F)
#      [,1]
# [1,]    3

This is what I don't get, why when I set the cost of insertions equal to zero, the outcome is zero in the total distance. I would expect to be 2, because of the number of substitutions.

adist("kitten", "sitting", costs = list(ins=0, del=0, sub=1), partial = F)
#      [,1]
# [1,]    0

Thanks a lot.

Psidom
  • 209,562
  • 33
  • 339
  • 356
Mario GS
  • 859
  • 8
  • 22

1 Answers1

4

This is easier to explain if you take a look at the actual counts while specifying the costs for each operation as you did:

drop(attr(adist("kitten", "sitting", costs = list(ins=0, del=0, sub=1), 
                 partial = F, counts = T), "counts"))

# ins del sub 
#   6   5   0 

So you see that instead of:

# ins del sub 
#   1   0   2 

The number of operations for insertions, deletions and substitutions has changed when you specify a set of different cost parameters from the default. This makes sense since there are more than one way to transform one string to another and the distance from adist is, according to ?adist:

a generalized Levenshtein (edit) distance, giving the minimal possibly weighted number of insertions, deletions and substitutions needed to transform one string into another.

According to this statement, there should be an optimization under hood to minimize the number of operations weighted by the cost parameters, so if we specify insertions and deletions cost parameters to be zero, then it will not use substitution operation any more since the previous two operations are obviously less costly and indeed it uses 6 insertions and 5 deletions to complete the transformation which ends up with a zero distance since the cost for these two operations are zero.

Psidom
  • 209,562
  • 33
  • 339
  • 356