subsetting a data frame based on a condition of one column

Question

I have a big data frame. I want to make a subset based on the condition of the values of one column. Say:

a<-data.frame(x=rep(1:5,5),y=rnorm(25),z=runif(25))

I want to make a subset based on the values of column x. For instance taking values of x=c(2,3,5) and create another dataframe.

akrun · Accepted Answer · 2016-06-25T15:38:36.177

3

We can use %in%

a1 <- a[a$x %in% x,]

For subsetting only the column 'x'

a1 <- a[a$x %in% x, "x", drop=FALSE]

If we need to subset the column 'x' to create a vector based on the x vector

v1 <- a$x[a$x %in% x]

edited Jun 25 '16 at 15:38

answered Jun 25 '16 at 15:26

akrun

874,273
37
540
662

Do we create another vector? Say x=c(2,3,5) – G1124E Jun 25 '16 at 15:29
@G1124E I understand the question as subsetting 'a' based on the values in the `x` vector – akrun Jun 25 '16 at 15:30
yes. I want to have another df based on the values on the column x. For example in the above df `a`, take only values of column x which is equal to 2,3 and 5. – G1124E Jun 25 '16 at 15:32
@G1124E That is what the first one did. If you need only the column 'x' `a1 <- a[a$x %in% x,'x', drop= FALSE]` – akrun Jun 25 '16 at 15:34
@ZheyuanLi I am not sure about the expected output. It could be just the column 'x'. – akrun Jun 25 '16 at 15:36
1

Thank you for the explanation @akrun. Got it. – G1124E Jun 25 '16 at 15:45

score 2 · Answer 2 · edited Jun 25 '16 at 16:08

Or you could use subset:

filter <- c(2,5)
subset(a, x %in% filter)

Or equivalently:

subset(a, match(x, filter, nomatch = 0)>0)

Or

a[match(a$x, filter, nomatch = 0)>0,]

   # x           y         z
# 2  2  0.76230930 0.9704342
# 5  5 -1.61846247 0.5786633
# 7  2  0.94024182 0.2805524
# 10 5 -0.08851427 0.6426568
# 12 2  0.78745436 0.1129637
# 15 5 -2.41274754 0.4826690
# 17 2 -0.37616238 0.9518877
# 20 5  1.18745381 0.8110062
# 22 2  0.03233245 0.4599623
# 25 5 -2.28360189 0.4836900

HNSKD · Answer 3 · 2017-07-08T13:53:01.463

We can use the value matching function %in% and the filter verb in the dplyr package (a great package for data manipulation).

library(dplyr)
a1 <- data.frame(x = rep(1:5,5), y=rnorm(25), z=runif(25))
a2 <- filter(a1, x %in% c(2,3,5))

> a2
   x           y         z
1  2  0.28184946 0.3564756
2  3  0.05634123 0.9826746
3  5 -0.58611510 0.8119334
4  2  0.45211282 0.6267487
5  3 -0.64741961 0.7600619
6  5 -0.28781978 0.3216957
7  2  0.51440342 0.5165707
8  3  1.41958340 0.2328647
9  5 -0.27751501 0.5400576
10 2 -0.74835287 0.7976089
11 3  2.42364991 0.4141980
12 5  0.22175161 0.1051387
13 2  1.54876157 0.6408956
14 3  0.54940989 0.3968186
15 5 -1.16333440 0.9359615

subsetting a data frame based on a condition of one column

3 Answers3