0

I have a data set like such.

df = data.frame(Business = c('HR','HR','Finance','Finance','Legal','Legal','Research'), Country = c('Iceland','Iceland','Norway','Norway','US','US','France'), Gender=c('Female','Male','Female','Male','Female','Male','Male'), Value =c(10,5,20,40,10,20,50))

I need to be filter out all rows where both male value and female value are >= 10. For example, Iceland HR should be removed as well as Research France.

I've tried df %>% group_by(Business,Country) %>% filter((Value>=10)) but this filters out any value less than 10. any ideas?

Ted Mosby
  • 1,426
  • 1
  • 16
  • 41

2 Answers2

1

Maybe this can help:

library(reshape2)
df2 <- reshape(df,idvar = c('Business','Country'),timevar = 'Gender',direction = 'wide')
df2 %>% mutate(Index=ifelse(Value.Female>=10 & Value.Male>=10,1,0)) %>%
  filter(Index==1) -> df3
df4 <- reshape2::melt(df3[,-5],idvar=c('Business','Country'))

  Business Country     variable value
1  Finance  Norway Value.Female    20
2    Legal      US Value.Female    10
3  Finance  Norway   Value.Male    40
4    Legal      US   Value.Male    20
Duck
  • 39,058
  • 13
  • 42
  • 84
1

You could just use two ave steps, one with length, one with min.

df <- df[with(df, ave(Value, Country, FUN=length)) == 2, ]
df[with(df, ave(Value, Country, FUN=min)) >= 10, ]
#   Business Country Gender Value
# 3  Finance  Norway Female    20
# 4  Finance  Norway   Male    40
# 5    Legal      US Female    10
# 6    Legal      US   Male    20

Notice that this also works if we disturb the data frame.

set.seed(42)
df2 <- df[sample(1:nrow(df)), ]

df2 <- df2[with(df2, ave(Value, Country, FUN=length)) == 2, ]
df2[with(df2, ave(Value, Country, FUN=min)) >= 10, ]
#   Business Country Gender Value
# 5    Legal      US Female    10
# 6    Legal      US   Male    20
# 3  Finance  Norway Female    20
# 4  Finance  Norway   Male    40
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • This is nice solution, but I need to make sure each Business and Country have both a male and female variable and that both those values are above 10. – Ted Mosby Jul 02 '20 at 15:08
  • @TedMosby `ave` is also capable of this. Just updated my answer! – jay.sf Jul 02 '20 at 15:14