How to subset a dataframe with a conditional statement based on multiple column values

Question

I'm trying to subset a dataframe on the basis of conditions from multiple columns. Here is my dataframe.

var1 <- c(x,x,x,y,y,z,z,z,z) 
var2 <- c(a,b,c,a,b,a,b,c,d) 
var3 <- c(2,4,1,4,1,6,2,5,8)
data1 <- data.frame(var1,var2,var3)
# -------------------------------------------------------------------------
#     var1 var2 var3
# 1    x    a    2
# 2    x    b    4
# 3    x    c    1
# 4    y    a    4
# 5    y    b    1
# 6    z    a    6
# 7    z    b    2
# 8    z    c    5
# 9    z    d    8

Output

The output I expect is:

#     var1
# 1    y
# 2    z

Condition

The following are the conditions leading to the output:

The output is a dataframe where only values of var1 are selected.

Values of var3 where var2 is equal to a is greater than values of var3 where var2 is equal to b.

I'm unable to create a code based on this complicated condition from multiple columns.

Thank you.

Please add the expected output, and your attempts to solve this problem. — yarnabrina, Sep 21 '19 at 09:11
https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Bulat, Sep 21 '19 at 09:11

deepseefan · Answer 1 · 2019-09-21T10:46:00.073

1

This can give you a factor:

subset(data1, (var2=="a"))[subset(data1, (var2=="a"))$var3 > subset(data1, (var2=="b"))$var3, "var1"]

# [1] y z
# Levels: x y z

You can use data.frame to get what you want as follows:

data.frame(var1 = subset(data1, (var2=="a"))[subset(data1, (var2=="a"))$var3 > subset(data1, (var2=="b"))$var3, "var1"])
#   var1
# 1    y
# 2    z

edited Sep 21 '19 at 10:46

answered Sep 21 '19 at 09:18

deepseefan

3,701
3
18
31

Teun Siebers · Answer 2 · 2019-09-21T10:52:09.217

1

The most intuitive solution might be to use a for-loop. Probably, there are shorter and more elegant ways to solve this problem, but this should work:

selection <- c()

for(i in unique(var1)) {
  var_store <- data1 %>%
    filter(var1 == i, var2 == a | var2 == b)

  if(filter(var_store, var2 == a) %>% 
    select(var3) %>% 
    as.numeric() > 
  filter(var_store, var2 == b) %>% 
    select(var3) %>% 
    as.numeric()) {

    selection <- c(selection , unique(var_store$var1))
  }
}

data1 %>% 
  filter(var1 %in% selection)


# # A tibble: 6 x 3
#   var1  var2   var3
#   <chr> <chr> <dbl>
# 1 y     a         4
# 2 y     b         1
# 3 z     a         6
# 4 z     b         2
# 5 z     c         5
# 6 z     d         8

edited Sep 21 '19 at 10:52

answered Sep 21 '19 at 10:40

Teun Siebers

21
5

I have been able to get the desired answer by transposing the dataframe using dcast() – sayan de sarkar Sep 21 '19 at 11:06
@sayandesarkar, in that case, you can answer your own question and accept it as an answer. – deepseefan Sep 21 '19 at 11:12

score 0 · Answer 3 · answered Sep 22 '19 at 08:05

0

I found that reshaping the dataframe can solve my problem. I have been transposed var2 using dcast() to get the desired result

answered Sep 22 '19 at 08:05

sayan de sarkar

185
1
10

How to subset a dataframe with a conditional statement based on multiple column values

Output

Condition

3 Answers3