1

There are a number of solutions for using grepl(), but none which solves my problem (that I have come across so far). I have two data frames. The first labelled x containing a set of combinations associated with a letter:

structure(list(variable = c("A", "B", "C", "D"), combinations = c("16, 17, 18", 
"17,18", "16,18", "12,3")), class = "data.frame", row.names = c(NA, 
-4L))

> x
  variable combinations
1        A   16, 17, 18
2        B        17,18
3        C        16,18
4        D         12,3

The second data frame is the results. It is a set of observations showing the letters that a species interacted with. Below is just one set of observations:

structure(list(variable = c("A, C", NA, NA), species = c("16", 
"17", "18"), active = c("16", NA, NA)), class = "data.frame", row.names = c(NA, 
-3L))

> y
  variable species active
1     A, C      16     16
2     <NA>      17   <NA>
3     <NA>      18   <NA>

This was the original structure of y:

> y
  variable species.active species.present
1     A, C             16           17,18

The structure was changed to add more columns associated to each species (so each species had a row), thus the structure serves a specific purpose.

What I want is to have a binary column (T/F or 0/1) to show whether or not each species are in the combinations associated with the variable.

This is what I have managed so far:

library(zoo)
library(dplyr)
#carry locf so that each species are assigned the same variables 
y <- y %>% 
  mutate(variable = zoo::na.locf(variable))

#separate each row to separate combinations 
library(tidyr)
y <- separate_rows(y, variable)

#match x$variable by y$variable to add associated combinations in a new column in y
y$combinations <- ifelse(y$variable %in% x$variable, x$combinations)

#return true or false if each species is in the combination
y$type <- grepl(y$species, y$combinations);y

> y
variable species active combinations type 
  <chr>    <chr>   <chr>  <chr>        <lgl>
1 A        16      16     16, 17, 18   TRUE 
2 C        16      16     17,18        FALSE
3 A        17      NA     16,18        TRUE 
4 C        17      NA     12,3         FALSE
5 A        18      NA     16, 17, 18   TRUE 
6 C        18      NA     17,18        FALSE

As you can see, the combinations are wrong and the gprel() returns incorrect T/F (refer to row 3 where it says it is true but '17' is not in the combination anyway.

If anyone could help, that would be greatly appreciated.

oguz ismail
  • 1
  • 16
  • 47
  • 69
kpm
  • 53
  • 5
  • Please specify your actual expected output, what should `type` really be? I'm thinking T,F,F,F,T,T? – r2evans Jun 30 '21 at 12:05
  • The expected output is shown in the last data frame above, where y has two new columns: combinations and type. Type should be returning the correct true or false (it can be any type of binary output) based on the combinations column. But both columns are incorrect. – kpm Jun 30 '21 at 12:09
  • "Warning message: In grepl(y$species, y$combinations) : argument 'pattern' has length > 1 and only the first element will be used" – iod Jun 30 '21 at 12:14
  • Your `ifelse` is broken, it needs a `no=` argument (even if it isn't used in this sample). – r2evans Jun 30 '21 at 12:36

1 Answers1

0

Try this, choosing one of type1 or type2 (same result), whichever you prefer.

library(dplyr)
left_join(y, x, by = "variable") %>%
  mutate(
    type1 = mapply(`%in%`, species, strsplit(combinations, "\\D+")),
    type2 = mapply(grepl, paste0("\\b", species, "\\b"), combinations)
  )
# # A tibble: 6 x 6
#   variable species active combinations type1 type2
#   <chr>    <chr>   <chr>  <chr>        <lgl> <lgl>
# 1 A        16      16     16, 17, 18   TRUE  TRUE 
# 2 C        16      16     16,18        TRUE  TRUE 
# 3 A        17      <NA>   16, 17, 18   TRUE  TRUE 
# 4 C        17      <NA>   16,18        FALSE FALSE
# 5 A        18      <NA>   16, 17, 18   TRUE  TRUE 
# 6 C        18      <NA>   16,18        TRUE  TRUE 

Or starting with the original y:

y
#   variable species active
# 1     A, C      16     16
# 2     <NA>      17   <NA>
# 3     <NA>      18   <NA>

y %>%
  mutate(variable = zoo::na.locf(variable)) %>%
  tidyr::separate_rows(variable) %>%
  left_join(., x, by = "variable") %>%
  mutate(type1 = mapply(`%in%`, species, strsplit(combinations, "\\D+")), type2 = mapply(grepl, paste0("\\b", species, "\\b"), combinations))
# # A tibble: 6 x 6
#   variable species active combinations type1 type2
#   <chr>    <chr>   <chr>  <chr>        <lgl> <lgl>
# 1 A        16      16     16, 17, 18   TRUE  TRUE 
# 2 C        16      16     16,18        TRUE  TRUE 
# 3 A        17      <NA>   16, 17, 18   TRUE  TRUE 
# 4 C        17      <NA>   16,18        FALSE FALSE
# 5 A        18      <NA>   16, 17, 18   TRUE  TRUE 
# 6 C        18      <NA>   16,18        TRUE  TRUE 

FYI, some things wrong with your question:

  1. When asking questions that include warnings or errors, you need to include them; in this case, grepl's first argument must be length 1, and it appears you are ignoring it:

    grepl(y$species, y$combinations)
    # Warning in grepl(y$species, y$combinations) :
    #   argument 'pattern' has length > 1 and only the first element will be used
    
  2. ifelse in your code seems to work, but you are using it incorrectly: it requires a no= argument as well, so there needs to be something as its third argument. It does not error here because everything resolves to be true (which is another problem) so it never attempts to evaluate no=.

    ifelse(c(T,T), 1:2)
    # [1] 1 2
    ifelse(c(T,F), 1:2)
    # Error in ifelse(c(T, F), 1:2) : argument "no" is missing, with no default
    ifelse(c(T,F), 1:2, 11:12)
    # [1]  1 12
    
  3. What you're attempting to do is merge/join x and y, so the tools you want are among base::merge and dplyr::*_join (for starters, others exist). To better understand what's going on in a join, I suggest you see How to join (merge) data frames (inner, outer, left, right), https://stackoverflow.com/a/6188334/3358272.

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thank you very much for the solution. The combination column is incorrect, is there a solution to fix as well? – kpm Jun 30 '21 at 12:28
  • Sorry, to clarify, my combinations are wrong to begin with, but I don't know how to fix it. – kpm Jun 30 '21 at 12:37
  • See my edit, I think I've resolved the issues. – r2evans Jun 30 '21 at 12:44
  • Thanks for the update, I originally had it written in a similar way as the first option , but struggled, so tried to break down each problem I had. Glad that I'm very slowly heading in the write direction! Cheers – kpm Jul 01 '21 at 06:31