0

I am working on R with strings. My dataframe DF has next structure:

DF <- data.frame(v1=c(1,2,3),v2=c("Oranges are fruits","Hit music","Ferrari is red"),stringsAsFactors = F)

  v1                 v2
1  1 Oranges are fruits
2  2          Hit music
3  3     Ferrari is red

And I have a vector d which contains:

d <- c("fruits","red")

I am looking for a way to test if all strings in v2 have coincidence with d. In this way, I have tried next code:

DF$v3 <- grepl(d,DF$v2)

But I get this result:

  v1                 v2    v3
1  1 Oranges are fruits  TRUE
2  2          Hit music FALSE
3  3     Ferrari is red FALSE

Which is not correct because string in third row of v2 has word red that is contained in d. Is it any way to obtain an output like this:

  v1                 v2    v3
1  1 Oranges are fruits  TRUE
2  2          Hit music FALSE
3  3     Ferrari is red  TRUE

My original dataset is larger and DF is a sample of it. Many thanks for your help.

Duck
  • 39,058
  • 13
  • 42
  • 84

2 Answers2

1

From ?grepl, about the pattern argument:

If a character vector of length 2 or more is supplied, the first element is used

so supplying the length 2 d will only search for fruits.

To see if any of the strings in d match, you can either use an approach with any and iteration, or instead collapse d with the | symbol to use as the pattern as below. Note that a sentence like "He was barred" will match for "red" in this example.

DF <- data.frame(v1 = c(1, 2, 3), v2 = c("Oranges are fruits", "Hit music", "Ferrari is red"), stringsAsFactors = F)
d <- c("fruits", "red")

DF$v3 <- grepl(paste0(d, collapse = "|"), DF$v2)
DF
#>   v1                 v2    v3
#> 1  1 Oranges are fruits  TRUE
#> 2  2          Hit music FALSE
#> 3  3     Ferrari is red  TRUE

Created on 2019-07-12 by the reprex package (v0.3.0)

Calum You
  • 14,687
  • 4
  • 23
  • 42
0

One approach is to use apply twice together with grepl. It is basically doing a double for loop. For each element in v2, the grepl is applied to each element of d.

DF$v3 <- sapply(DF$v2, FUN = function(s) any(sapply(d, FUN = grepl, s)))
DF
Emer
  • 3,734
  • 2
  • 33
  • 47