0

I have a dataframe like this: pta corpus

Each row of pta_content is the contents of preferential trade agreements. I'm trying to calculate the similarities between each row and obtain a similarity matrix with the name of pta.

I have tried stringdist, it seems that stringdist is used for two dataframes. how can i calculate the pairwise similarities between each row within a dataframe?

1 Answers1

0
a <- c("abcdefg", "hijklmnop", "qrstuvwxyz")
b <- c("abXdeXg", "hiXklXnoX", "Xrstuvwxyz")

library(RecordLinkage)
levenshteinSim(a, b)

Result

[1] 0.7142857 0.6666667 0.9000000

Since the data is not there, there's not much I can do.

This is taken from Similarity scores based on string comparison in R (edit distance)

Rana Usman
  • 1,031
  • 7
  • 21