24

I have a data.frame where I'd like to remove entire groups if any of their members meets a condition.

In this first example, if the values are numbers and the condition is NA the code below works.

df <- structure(list(world = c(1, 2, 3, 3, 2, NA, 1, 2, 3, 2), place = c(1, 
1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1, 1, 1, 2, 2, 2, 3, 
3, 3, 3)), .Names = c("world", "place", "group"), row.names = c(NA, 
-10L), class = "data.frame")

ans <- ddply(df, . (group), summarize, code=mean(world))
ans$code[is.na(ans$code)] <- 0
ans2 <- merge(df,ans)
final.ans <- ans2[ans2$code !=0,]

However, this ddply maneuver with the NA values will not work if the condition is something other than "NA", or if the value are non-numeric.

For example, if I wanted to remove groups which have one or more rows with a world value of AF (as in the data frame below) this ddply trick would not work.

df2 <-structure(list(world = structure(c(1L, 2L, 3L, 3L, 3L, 5L, 1L, 
4L, 2L, 4L), .Label = c("AB", "AC", "AD", "AE", "AF"), class = "factor"), 
    place = c(1, 1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1, 
    1, 1, 2, 2, 2, 3, 3, 3, 3)), .Names = c("world", "place", 
"group"), row.names = c(NA, -10L), class = "data.frame")

I can envision a for-loop where for each group the value of each member is checked, and if the condition is met a code column could be populated, and then a subset could me made based on that code.

But, perhaps there is a vectorized, r way to do this?

Henrik
  • 65,555
  • 14
  • 143
  • 159
nofunsally
  • 2,051
  • 6
  • 35
  • 53

4 Answers4

32

Try

library(dplyr)
df2 %>%
  group_by(group) %>%
  filter(!any(world == "AF"))

Or as per metionned by @akrun:

setDT(df2)[, if(!any(world == "AF")) .SD, group]

Or

setDT(df2)[, if(all(world != "AF")) .SD, group]

Which gives:

#Source: local data frame [7 x 3]
#Groups: group
#
#  world place group
#1    AB     1     1
#2    AC     1     1
#3    AD     2     1
#4    AB     1     3
#5    AE     2     3
#6    AC     3     3
#7    AE     1     3
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
  • 5
    Or `setDT(df2)[, if(!any(world=='AF')) .SD, group]` using data.table or `setDT(df2)[, if(all(world!='AF')) .SD, group]` – akrun Jul 27 '15 at 19:44
  • For some reasons ```dplyr::filter``` is very slow, particularly if group by multiple columns – June Nov 29 '19 at 01:57
11

Alternate data.table solution:

setDT(df2)
df2[!(group %in% df2[world == "AF",group])]

gives:

   world place group
1:    AB     1     1
2:    AC     1     1
3:    AD     2     1
4:    AB     1     3
5:    AE     2     3
6:    AC     3     3
7:    AE     1     3

Using keys we can be a bit faster:

setkey(df2,group) 
df2[!J((df2[world == "AF",group]))]
zx8754
  • 52,746
  • 12
  • 114
  • 209
Chris
  • 6,302
  • 1
  • 27
  • 54
  • 1
    @Frank thanks, I'm much more used to the second option and forget about the weird behaviour of `DT[group != c(1,2)]` – Chris Jul 27 '15 at 19:56
  • 1
    Yeah, I've gotten so used to joins that I use them all the time instead of old-fashioned subsetting. – Frank Jul 27 '15 at 19:57
5

base package:

df2[ df2$group != df2[ df2$world == 'AF', "group" ], ]

Output:

   world place group
1     AB     1     1
2     AC     1     1
3     AD     2     1
7     AB     1     3
8     AE     2     3
9     AC     3     3
10    AE     1     3

Using sqldf:

library(sqldf)
sqldf("SELECT df2.world, df2.place, [group] FROM df2 
      LEFT JOIN
      (SELECT  * FROM df2 WHERE world LIKE 'AF') AS t
      USING([group])
      WHERE t.world IS NULL")

Output:

  world place group
1    AB     1     1
2    AC     1     1
3    AD     2     1
4    AB     1     3
5    AE     2     3
6    AC     3     3
7    AE     1     3
zx8754
  • 52,746
  • 12
  • 114
  • 209
mpalanco
  • 12,960
  • 2
  • 59
  • 67
2

Base R option using ave

df2[with(df2, ave(world != "AF", group, FUN = all)),]

#   world place group
#1     AB     1     1
#2     AC     1     1
#3     AD     2     1
#7     AB     1     3
#8     AE     2     3
#9     AC     3     3
#10    AE     1     3

Or we can also use subset

subset(df2, ave(world != "AF", group, FUN = all))

The above can also be written as

df2[with(df2, !ave(world == "AF", group, FUN = any)),]

and

subset(df2, !ave(world == "AF", group, FUN = any))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213