0

I am interested in applying the levenshteinSim function from the Record-Linkage package to vectors of strings (there's a good discussion on the function here ).

Imagine that I have a vector called codes: "A","B","C","D",etc.; And a vector called tests: "A","B","C","D",etc.

Using sapply to test a particular value in 'tests' against the vector of codes,

sapply(codes,levenshteinSim,str2=tests[1])

I would expect to get a list or vector (my apologies if I make terminological mistakes): [score1] [score2] [score3].

Unfortunately, the output is a test of the value in tests[1] against c("A","B","C","D", ...) -- a single value.

Ultimately, I want to *apply the two vectors against one another to produce a matrix of length len1*len2 -- but I don't want to move forward until I understand what I'm doing wrong.

Can anyone provide guidance?

Community
  • 1
  • 1
dubhousing
  • 551
  • 1
  • 4
  • 10

1 Answers1

0

I'm not sure where the problem lies:

 library(RecordLinkage)
 sapply(codes,levenshteinSim,str2=test)
     A B C D
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1

When str2 is just one item, you get a length 4 vector.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks for your response @DWin. I will have to grab some sample of my data to test things out. One note: I am "tolower"'ing each element (because levenshteinSim is case sensitive). I'm not sure how that affects things. – dubhousing Nov 06 '13 at 00:52