match words between two data frame in R

Question

I'm working in R and I have two data frames in Arabic language as shown here is a sample of the data set : Dataset 1: vocab

    term
1:   شكرا
2:    رقي
3: تضيعون
4:   ابكي

Dataset 2: posneg

    score     words 
1      ابكي      0
2      تضيعون     0
3      خسرت     0
4      ظلمونا     0
5      لا     0
6      مستهتر     0
7      وبلا     0
8      احباط     0
9      تفشلتوا     0
10      خسرتم     0
11      عقدتك     0
12      للاسف     0
13      مشكله     0
29      اضاع     0
30      حاقده     0
31      خطا     0
32      غير     0
33      ما     0 
116     ابدعوا     1
117     اهنيكم     1
118     حبا     1
119     شكرا     1
120     فرحه     1
121     ممتاز     1
122     وزعيما     1
123     اجتهد     1
124     باهر     1
125     حبك     1
126     صحيح     1
127     فزت     1

I need to compare between term column in data 1 and words column in data 2 so if any word in term column in data 1 match any word in words column in data 2 gives it the same score, and if the word not match I want to write (new). Here is the result that i expect:

  score      term
1:   شكرا     1 
2:   1       رقي 
3:   0      تضيعون
4:   0        ابكي

here is the code the I wrote but get an error.

 n<-length(vocab$term)
  n2<-length(posneg$words)


      for (i in 1:n) {
        if (vocab$term[i] == for (o in 1:n2) { posneg$words[o]}) 
          {
        vocab <- cbind(vocab, "score" = posneg$score[o] )} #add new column)
        else{
          vocab <- cbind(vocab, "score" = "no") #add new column
            }
        }

hope you understand me, thank you!

Can you make your example reproducible, please? You can use e.g. `read.table(text =...)` or even better, use `dput`. — Roman Luštrik, Dec 10 '18 at 09:48

score 2 · Answer 1 · answered Dec 10 '18 at 09:54

2

Salam,

Not entirely sure if this is what you want. Nevertheless, I've used tidyverse to format an if_else statement to match the word columns in your two databases. If the word appears in both then a 1 is printed to the new data, if they don't a 0 is printed.

For example,

library(tidyverse)

data1 <- data.frame(Term = c("A","B","Z","D"))
data2 <- data.frame(words = c("A","B","C","D","E","F"), score = c(1,4,5,2,4,5))

data3 <- data1 %>%
  mutate(score = if_else(data1$term %in% data2$words, 1, 0))

> str(data3)
'data.frame':   4 obs. of  2 variables:
 $ Term : chr  "A" "B" "Z" "D"
 $ score: num  1 1 0 1

Does this answer your question?

answered Dec 10 '18 at 09:54

Pryore

510
9
22

its almost same but I dont need it to print 1,0 I need it to print the score for example: $ Term : chr "A" "B" "Z" "D" $ score: num 1 4 new 2 where "new" mean that is a new word and its not match any words – Fatima Dec 10 '18 at 10:04
I think OP's request is to get the matching score, not boolean value. – Darren Tsai Dec 10 '18 at 10:07
because I don't need him to tell me there is match or not. 0 and 1 for me mean the word is positive or negative so if I just give a score based on matching it is not help me, and i dont know if the word positive or negative. – Fatima Dec 10 '18 at 10:13
The answer above by @snoram is probably the best approach in this case – Pryore Dec 10 '18 at 10:15
exactly like what Darren Tsai says – Fatima Dec 10 '18 at 10:15

score 2 · Accepted Answer · answered Dec 10 '18 at 10:10

2

Using Pryore's data and data.table:

library(data.table)
setDT(data1)
setDT(data2)
data2[data1, on = .(words = Term)]
   words score
1:     A     1
2:     B     4
3:     Z    NA
4:     D     2

answered Dec 10 '18 at 10:10

s_baldur

29,441
4
36
69

match words between two data frame in R

2 Answers2