-1

I have two data frames in that DF1 is (word dictionary) and DF2 is sentences.I want to make text matching in such a way that If word in DF1 matches to DF2 sentence(any word from sentence) then output should be column with yes if match or No if won't match data frames are as follow:

(DF1) word dictionary:

DF1 <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase")

(DF2)sentences:

DF2 <- c("Customer satisfaction index improvement", "reduction in retail cycle", "Improve market share", "% recovery from vendor")

and output should be:

Customer satisfaction index improvement ( yes)

reduction in retail cycle (no)

Improve market share (yes)

% recovery from vendor (no)

note- yes and No is different column showing result of text matching Can anyone help .....thanks in advance

User2321
  • 2,952
  • 23
  • 46
roshan
  • 51
  • 1
  • 9
  • Please reshape your question to contain the two datasets with a format that can be copy-pasted as well as the end-result otherwise its difficult to answer your question. – User2321 Nov 09 '16 at 10:24
  • DF1 is 1st data frame and DF2 if 2nd data frame and output should be like if 1st row of df2 is Customer satisfaction index improvement then it shows yes – roshan Nov 09 '16 at 10:29
  • Yes yes I understand that, but it is not in a format that somebody can easily copy and paste into his R session to look for an answer. You can try to put dput(DF1) or something like that in order to make it easier. For more details see here: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – User2321 Nov 09 '16 at 10:34
  • df1<- c(csi,dsi,market,share,improvement,dealers,increase) and df2<-c(Customer satisfaction index improvement,reduction in retail cycle,Improve market share,% recovery from vendor) and i hope you understood what output i want – roshan Nov 09 '16 at 10:46
  • Well see the answer and tell me – User2321 Nov 09 '16 at 10:54
  • brother i applied same to larger data set but it only showing me yes in output what can be the reasons? – roshan Nov 09 '16 at 12:47
  • can you share your email id i will send you sample data in excel format – roshan Nov 10 '16 at 08:18

2 Answers2

2

You could do it like this:

df <- data.frame(sentence = c("Customer satisfaction index improvement", "reduction in retail cycle", "Improve market share", "% recovery from vendor"))
words <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase")

# combine the words in a regular expression and bind it as column yes
df <- cbind(df, yes = grepl(paste(words, collapse = "|"), df$sentence))


This outputs
                                 sentence   yes
1 Customer satisfaction index improvement  TRUE
2               reduction in retail cycle FALSE
3                    Improve market share  TRUE
4                  % recovery from vendor FALSE

See it working on ideone.com.

Jan
  • 42,290
  • 8
  • 54
  • 79
1

Try this:

DF1 <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase")
DF2 <- c("Customer satisfaction index improvement", "reduction in retail cycle", "Improve market share", "% recovery from vendor")


result <- cbind(DF2, "word found" = ifelse(rowSums(sapply(DF1, grepl, x = DF2)) > 0, "YES", "NO"))

> result
     DF2                                       word found
[1,] "Customer satisfaction index improvement" "YES"     
[2,] "reduction in retail cycle"               "NO"      
[3,] "Improve market share"                    "YES"     
[4,] "% recovery from vendor"                  "NO"    
User2321
  • 2,952
  • 23
  • 46
  • when i am applying it to complete data set it only shows "yes" in output – roshan Nov 09 '16 at 11:50
  • what do you mean by that? I guess that your complete dataset just contains more words in DF1 or more sentences in DF2 and in either case there should not be any change. – User2321 Nov 09 '16 at 12:29
  • DF1 contain more words as its word dictionary and DF2 is description which is in sentences and i just gave sample of it as i cannot paste complete data here – roshan Nov 09 '16 at 12:52
  • Yes I understand what you mean. The only reason I can think of is that the data from either the words or the sentences is not in vector format. Could it be that you have just one big string? Please paste the results of str(DF2) and str(DF1) – User2321 Nov 09 '16 at 12:56
  • can you share your email id i will send you sample data frame in excel file – roshan Nov 10 '16 at 07:56
  • Man I am sorry but no this is not the way to solve the problem. Use the str function as I mentioned before. For example using str(DF1) with the code I have in my answer gives the following result: `> str(DF1) chr [1:7] "csi" "dsi" "market" "share" "improvement" "dealers" "increase"` – User2321 Nov 10 '16 at 08:12