1

My question is somewhat a prequel to the question asked in Visualise distances between texts

I have a table with two sentences to compare for each observation.

compare <- read.table(header=T,sep="|", text=
"person | text1 | text2
person1 | the quick brown fox jumps over the lazy dog | the quick cat jumps on the fast fog
person2 | I dont want to work today | I feel like working today
"
)

I want a column where the values represent the difference between two sentences for each observation. Basically I am looking for functions similar to agrep but for comparing sentences or paragraphs.

Community
  • 1
  • 1
Rfan
  • 722
  • 6
  • 11

2 Answers2

0

You can compute differences between strings with the adist function. mapply allows you to apply it to all rows:

mapply(adist, compare$text1, compare$text2)
# [1] 17 15
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
0

I had to learn a bit of text mining. Using tm I have created a function to compare two sentences or paragraphs and give a numeric value.

library(tm)

dis <- function(text1,text2){
#creating a corpus
text_c <- rbind(text1,text2)
myCorpus <- Corpus(VectorSource(text_c))
#creating a term document matrix
tdmc <- TermDocumentMatrix(myCorpus, control = list(removePunctuation = TRUE, stopwords=TRUE))
#computing dissimilarity
return(dissimilarity(tdmc, method = "cosine"))
}

compare$dis <- mapply(dis, compare$text1, compare$text2)


person                                          text1                                 text2 dis
person1   the quick brown fox jumps over the lazy dog   the quick cat jumps on the fast fog 0.63
person2                     I dont want to work today             I feel like working today 0.75
Rfan
  • 722
  • 6
  • 11