I'm wondering how to perform text similarity using R from a dataframe. I have the below code that works perfectly when I directly input what to compare, but I'm struggling to get it to work with words contained in my dataframe. I wanna compare all pairs of words in my dataframe. Any ideas? Thanks in advance.
library(textcat)
?textcat_xdist
round(textcat_xdist(
list(
text1="hello there",
text2="why hello there",
text3="totally different"
),
method="cosine"),
3)
Data <- data.frame(
X = sample(1:4),
Word = sample(c("hello", "hellow", "hellloooo", "different"), 4, replace = TRUE)
)