-1

I have of dataframe of three-word and two-phrases and the counts of each phrase found in a text, respectively. Here is some dummy data:

   trig <- c("took my dog", "took my cat", "took my hat", "ate my dinner", "ate my lunch")
   trig_count <- c(3, 2, 1, 3, 1)
   big <- c("took my", "took my", "took my", "ate my", "ate my")
   big_count <- c(6,6,6,4,4)
   df <- data.frame(trig, trig_count, big, big_count)
   df$trig <- as.character(df$trig)
   df$big <- as.character(df$big)

          trig    trig_count   big    big_count
   1  took my dog          3  took my         6        2  took my cat                 
   2  took my         6
   3  took my hat          1  took my         6
   4  ate my dinner        3  ate my          4
   5  ate my lunch         1  ate my          4

I would like to write a function that takes as input any two-word phrase and returns the rows in the df if there is a match, and "no match" if there is no match.

I've tried variations of this:

   match_test <- function(x){
                 ifelse(x %in% df$big==T, df[df$big==x,], "no match")
                 }

It works fine for a two-word phrase that isn't in the df, for instance:

    match_test("looked for")

returns

    "no match"

But for words that do have a match, it doesn't work, for instance:

   match_test("took my")

returns

    "took my dog" "took my cat" "took my hat"

When what I am looking for is this:

           trig    trig_count   big    big_count
    1  took my dog          3  took my         6
    2  took my cat          2  took my         6
    3  took my hat          1  took my         6

What is it about %in% that I am not understanding? Or is it something else? Would really appreciate your guidance.

hemalp108
  • 1,209
  • 1
  • 15
  • 23
carozimm
  • 109
  • 7
  • 2
    Maybe this? `df[grep("took my", df$big), ]` – Ronak Shah Dec 02 '16 at 11:28
  • Thanks for the lightning-quick response, Ronak. I though about using grep() instead, but I don't know how to use that function programmatically, ie grep(x, df$big) wouldn't work because you need quote marks. Any ideas? – carozimm Dec 02 '16 at 11:36
  • It would. Try it out. `match_test <- function(x){ df[grep(x, df$big), ] }` – Ronak Shah Dec 02 '16 at 11:37
  • Possible duplicate of [Filtering row which contains a certain string using dplyr](http://stackoverflow.com/questions/22850026/filtering-row-which-contains-a-certain-string-using-dplyr) or [Selecting rows where a column has a string like 'hsa..' (partial string match)](http://stackoverflow.com/questions/13043928/selecting-rows-where-a-column-has-a-string-like-hsa-partial-string-match) – Ronak Shah Dec 02 '16 at 12:09
  • Thanks everyone for your input. With your help I've gotten the function to do what I need it to do, but I'd still like to understand why my code didn't work -- if anyone has any ideas... – carozimm Dec 02 '16 at 13:25

3 Answers3

1

You don't need ifelse; you can do it just by subsetting your original df as @Ronak Shah suggests:

df[grep(match_test, df$big), ]

If you want to turn it into a function that still returns no match you could do:

match_test <- function(match_string) {

  subset_df <- df[grep(match_string, df$big), ]

  if (nrow(subset_df) < 1) {
    warning("no match")
  } else {
    subset_df
  }  

}

match_test("took my")
#          trig trig_count     big big_count
# 1 took my dog          3 took my         6
# 2 took my cat          2 took my         6
# 3 took my hat          1 took my         6

And if there's nothing to match:

match_test("coffee")
# Warning message:
# In match_test("coffee") : no match
Phil
  • 4,344
  • 2
  • 23
  • 33
  • I do need the result in a function that returns the string "no match" (rather than a warning) if there is no match, or else I would have simply done df[df$big==x,] -- though grep works as well. Thanks, Phil! – carozimm Dec 02 '16 at 13:22
  • @carozimm In that case replace `warning()` with `return()` – Phil Dec 02 '16 at 14:27
0

We can use str_detect

library(stringr)
library(dplyr)
df %>% 
     filter(str_detect(big, "took my"))
#        trig trig_count     big big_count
#1 took my dog          3 took my         6
#2 took my cat          2 took my         6
#3 took my hat          1 took my         6
akrun
  • 874,273
  • 37
  • 540
  • 662
0

We can try this too:

library(stringr)
match_test <- function(x){
  res <- df[which(!is.na(str_match(df$big,x))),]
  if(nrow(res) == 0) return('no match')
  return(res)
}
match_test("looked for")
#[1] "no match"
match_test("took my")
#         trig trig_count     big big_count
#1 took my dog          3 took my         6
#2 took my cat          2 took my         6
#3 took my hat          1 took my         6
match_test("ate my")
#           trig trig_count    big big_count
#4 ate my dinner          3 ate my         4
#5  ate my lunch          1 ate my         4
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63