I have two data frames. The first one - saved in an object named b:
structure(list(CONTENT = c("@myntra beautiful teamä»ç where is the winners list?",
"The best ever Puma wishlist for Workout freaks, Head over to @myntra https://t.co/V58Gk3EblW #MyPUMACollection Hit Like if you Find it good",
"I finalised on buy a top from Myntra, and then I found the same top at 20% off in jabong. I feel like I've achieved so much in life!",
"Check out #myPUMAcollection on @Myntra. Its perfect for a day at gym. https://t.co/VeRy4G3c7X https://t.co/fOpBRWCdSh",
"Check out #myPUMAcollection on @Myntra. Its perfect for a day at gym. https://t.co/VeRy4G3c7X.....",
"@DrDrupad @myntra #myPUMAcollection superb :)", "Super exclusive collection @myntra #myPUMAcollection https://t.co/Qm9dZzJdms",
"@myntra gave my best Love playing wid u Hope to win #myPUMAcollection",
"Check out PUMA Unisex Black Running Performance Gloves on Myntra! https://t.co/YD6IcvuG98 @myntra #myPUMAcollection",
"@myntra i have been mailing my issue daily since past week.All i get in reply is an auto generated assurance mail. 1st time pissed wd myntra"
), score = c(7.129, 7.08, 6.676, 5.572, 5.572, 5.535, 5.424,
5.205, 4.464, 4.245)), .Names = c("CONTENT", "score"), row.names = c(25L,
103L, 95L, 66L, 90L, 75L, 107L, 32L, 184L, 2L), class = "data.frame")
The second database - saved in an object named c:
structure(list(CONTENT = c("The best ever for workout over to myntra like if you find it good",
"i finalised buy a top myntra and found the at in feel like i so in life"
)), .Names = "CONTENT", row.names = c(103L, 95L), class = "data.frame")
I want to find for each statement in the second data frame (c), the closest match in the first data frame(b), and return the score from the first data frame(b).
For eg., the statement The best ever for workout over to myntra like if you find it good
matches closely with the second statement from data frame 1 and hence I should return the score 7.080
.
I tried using codes from stack overflow with some tweaks:
cp <- str_split(c$CONTENT, " ")
library(data.table)
nn <- lengths(cp) ## Or, for < R-3.2.0, `nn <- sapply(wordList, length)`
dt <- data.table(grp=rep(seq_along(nn), times=nn), X = unlist(cp), key="grp")
dt[,Score:=b$score[pmatch(X,b$CONTENT)]]
dt[!is.na(Score), list(avgScore=sum(Score)), by="grp"]
This returns the value for only one statement from df c. Can someone help?