subset data based on specific condition

Question

I have a dataset in which I have different columns. I want to subset the data into two different datasets based on certain conditions. For example:

 x       y        z                      m
001   20.19.0    86    16.30.45, 17.55.65, 18.23.21, 19.30.92
001   19.30.92   42    16.30.45, 17.55.65, 18.23.21, 19.30.92
001   22.42.42   52    16.30.45, 17.55.65, 18.23.21, 19.30.92
002   19.30.92   33    16.30.45, 17.55.65, 18.23.21, 19.30.92
002   21.30.22   65    16.30.45, 17.55.65, 18.23.21, 19.30.92
002   18.23.21   56    16.30.45, 17.55.65, 18.23.21, 19.30.92
002   25.63.54   85    16.30.45, 17.55.65, 18.23.21, 19.30.92

I want to subset base on the values in m, if the value in m is same as the value in y, I want to make another dataset for that. And the remaining to be another dataset. Any help would be appreciated. Thanks

Please add expected output and your attempt. Also make sure that your example is [reproducible](http://stackoverflow.com/questions/5963269) — Sotos, Jul 13 '18 at 14:25
What are the entries in column `m`? A `list`? A `character` string? Please provide sample data with `dput`. — Maurits Evers, Jul 13 '18 at 14:26

score 0 · Answer 1 · answered Jul 13 '18 at 15:02

If m is character type, you can use grepl like the following:

df1 = subset(df, mapply(grepl, y, m, fixed = TRUE))
df2 = subset(df, !mapply(grepl, y, m, fixed = TRUE))

or

df1 = df[mapply(grepl, df$y, df$m, fixed = TRUE),]
df2 = df[!mapply(grepl, df$y, df$m, fixed = TRUE),]

Result:

> df1
  x        y  z                                      m
2 1 19.30.92 42 16.30.45, 17.55.65, 18.23.21, 19.30.92
4 2 19.30.92 33 16.30.45, 17.55.65, 18.23.21, 19.30.92
6 2 18.23.21 56 16.30.45, 17.55.65, 18.23.21, 19.30.92

> df2
  x        y  z                                      m
1 1  20.19.0 86 16.30.45, 17.55.65, 18.23.21, 19.30.92
3 1 22.42.42 52 16.30.45, 17.55.65, 18.23.21, 19.30.92
5 2 21.30.22 65 16.30.45, 17.55.65, 18.23.21, 19.30.92
7 2 25.63.54 85 16.30.45, 17.55.65, 18.23.21, 19.30.92

Data:

df = structure(list(x = c(1L, 1L, 1L, 2L, 2L, 2L, 2L), y = c("20.19.0", 
"19.30.92", "22.42.42", "19.30.92", "21.30.22", "18.23.21", "25.63.54"
), z = c(86L, 42L, 52L, 33L, 65L, 56L, 85L), m = c("16.30.45, 17.55.65, 18.23.21, 19.30.92", 
"16.30.45, 17.55.65, 18.23.21, 19.30.92", "16.30.45, 17.55.65, 18.23.21, 19.30.92", 
"16.30.45, 17.55.65, 18.23.21, 19.30.92", "16.30.45, 17.55.65, 18.23.21, 19.30.92", 
"16.30.45, 17.55.65, 18.23.21, 19.30.92", "16.30.45, 17.55.65, 18.23.21, 19.30.92"
)), .Names = c("x", "y", "z", "m"), class = "data.frame", row.names = c(NA, 
-7L))

subset data based on specific condition

1 Answers1