0

I'm trying to subset a dataframe on the basis of conditions from multiple columns. Here is my dataframe.

var1 <- c(x,x,x,y,y,z,z,z,z) 
var2 <- c(a,b,c,a,b,a,b,c,d) 
var3 <- c(2,4,1,4,1,6,2,5,8)
data1 <- data.frame(var1,var2,var3)
# -------------------------------------------------------------------------
#     var1 var2 var3
# 1    x    a    2
# 2    x    b    4
# 3    x    c    1
# 4    y    a    4
# 5    y    b    1
# 6    z    a    6
# 7    z    b    2
# 8    z    c    5
# 9    z    d    8

Output

The output I expect is:

#     var1
# 1    y
# 2    z

Condition

The following are the conditions leading to the output:

  1. The output is a dataframe where only values of var1 are selected.
  2. Values of var3 where var2 is equal to a is greater than values of var3 where var2 is equal to b.

I'm unable to create a code based on this complicated condition from multiple columns.

Thank you.

Community
  • 1
  • 1
sayan de sarkar
  • 185
  • 1
  • 10

3 Answers3

1

This can give you a factor:

subset(data1, (var2=="a"))[subset(data1, (var2=="a"))$var3 > subset(data1, (var2=="b"))$var3, "var1"]

# [1] y z
# Levels: x y z

You can use data.frame to get what you want as follows:

data.frame(var1 = subset(data1, (var2=="a"))[subset(data1, (var2=="a"))$var3 > subset(data1, (var2=="b"))$var3, "var1"])
#   var1
# 1    y
# 2    z
deepseefan
  • 3,701
  • 3
  • 18
  • 31
1

The most intuitive solution might be to use a for-loop. Probably, there are shorter and more elegant ways to solve this problem, but this should work:

selection <- c()

for(i in unique(var1)) {
  var_store <- data1 %>%
    filter(var1 == i, var2 == a | var2 == b)

  if(filter(var_store, var2 == a) %>% 
    select(var3) %>% 
    as.numeric() > 
  filter(var_store, var2 == b) %>% 
    select(var3) %>% 
    as.numeric()) {

    selection <- c(selection , unique(var_store$var1))
  }
}

data1 %>% 
  filter(var1 %in% selection)


# # A tibble: 6 x 3
#   var1  var2   var3
#   <chr> <chr> <dbl>
# 1 y     a         4
# 2 y     b         1
# 3 z     a         6
# 4 z     b         2
# 5 z     c         5
# 6 z     d         8
0

I found that reshaping the dataframe can solve my problem. I have been transposed var2 using dcast() to get the desired result

sayan de sarkar
  • 185
  • 1
  • 10