Subset in R with specific values for specific columns identified by their index number

Question

If I have a data frame like this:

df = data.frame(A = sample(1:5, 10, replace=T), B = sample(1:5, 10, replace=T), C = sample(1:5, 10, replace=T), D = sample(1:5, 10, replace=T), E = sample(1:5, 10, replace=T))

Giving me this:

   A B C D E
1  1 5 1 4 3
2  2 3 5 4 3
3  4 2 2 4 4
4  2 1 2 5 2
5  3 3 4 4 5
6  3 2 3 1 5
7  1 5 4 2 3
8  1 3 5 5 1
9  3 1 1 3 5
10 5 3 1 2 4

How do I get a subset that includes all the rows where the values for certain columns (B and D, say) are equal to 1, with the columns identified by their index numbers (2 and 4) rather than their names? In this case:

   A B C D E
4  2 1 2 5 2
6  3 2 3 1 5
9  3 1 1 3 5

Does this answer your question? [Filter data.frame rows by a logical condition](https://stackoverflow.com/questions/1686569/filter-data-frame-rows-by-a-logical-condition) — ekoam, Jan 23 '22 at 10:10
Not quite. This gets me what I want: df[df[, which(names(df)=="B")] %in% 1 | df[, which(names(df)=="D")] %in% 1, ] But I need for it to accept a varying number of column values, and this will only work if I know there will be two. So I really want to be able to identify the columns with a vector. Something like this: df[df[, c(2, 4)] %in% 1, ] Except that this gives me nothing -- is that because it only returns values where *both* columns have values of 1? — Sergei Walankov, Jan 23 '22 at 10:34
You may want to take a look at [this](https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/) @SergeiWalankov — ekoam, Jan 23 '22 at 10:38

score 3 · Accepted Answer · answered Jan 23 '22 at 10:40

df[rowSums(df[c(2,4)] == 1) > 0,]
#   A B C D E
# 4 2 1 2 5 2
# 6 3 2 3 1 5
# 9 3 1 1 3 5

You said to compare values by column index, so df[c(2,4)] or (or df[,c(2,4)]).
df[c(2,4)] == 1 returns a matrix of logicals, whether the cell's value is equal to 1.
rowSums(.) > 0 finds those rows with at least one 1.
df[rowSums(.)>0,] selects just those rows.

Data

df <- structure(list(A = c(1L, 2L, 4L, 2L, 3L, 3L, 1L, 1L, 3L, 5L), B = c(5L, 3L, 2L, 1L, 3L, 2L, 5L, 3L, 1L, 3L), C = c(1L, 5L, 2L, 2L, 4L, 3L, 4L, 5L, 1L, 1L), D = c(4L, 4L, 4L, 5L, 4L, 1L, 2L, 5L, 3L, 2L), E = c(3L, 3L, 4L, 2L, 5L, 5L, 3L, 1L, 5L, 4L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))

score 0 · Answer 2 · answered Jan 23 '22 at 13:01

tidyverse

df <-
  structure(
    list(
      A = c(1L, 2L, 4L, 2L, 3L, 3L, 1L, 1L, 3L, 5L),
      B = c(5L, 3L, 2L, 1L, 3L, 2L, 5L, 3L, 1L, 3L),
      C = c(1L, 5L, 2L, 2L, 4L, 3L, 4L, 5L, 1L, 1L),
      D = c(4L, 4L, 4L, 5L, 4L, 1L, 2L, 5L, 3L, 2L),
      E = c(3L, 3L, 4L, 2L, 5L, 5L, 3L, 1L, 5L, 4L)
    ),
    class = "data.frame",
    row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10")
  )

library(tidyverse)
df %>% 
  filter(B == 1 | D == 1)
#>   A B C D E
#> 4 2 1 2 5 2
#> 6 3 2 3 1 5
#> 9 3 1 1 3 5

^{Created on 2022-01-23 by the reprex package (v2.0.1)}

data.table

library(data.table)

setDT(df)[B == 1 | D == 1, ]
#>    A B C D E
#> 1: 2 1 2 5 2
#> 2: 3 2 3 1 5
#> 3: 3 1 1 3 5

^{Created on 2022-01-23 by the reprex package (v2.0.1)}

Subset in R with specific values for specific columns identified by their index number

2 Answers2