0

Background

I've got a dataframe:

d <- data.frame(ID = c("a","a","b","b", "c","c"),
                event = c(0,1,0,0,1,1),
                event_date = as.Date(c("2011-01-01","2012-08-21","2011-12-23","2011-12-31","2013-03-14","2015-07-12")),
                stringsAsFactors=FALSE)

It's got some people (ID), a binary indicator of an event, and an event_date for that event.

The Problem

I want to subset this dataframe to return the rows for those ID's who have never had the event -- in other words, for ID's whose every event cell, no matter how many they have, equals zero. No exceptions.

In other words, I'd like to get something that looks like this:

ID event event_date
 b     0 2011-12-23
 b     0 2011-12-31

What I've tried

I've got a couple of ways of trying to do this, one in dplyr and the other in data.table (I'm more familiar with dplyr; the latter example is adapted from an answer to an older question I posted). But both get me the same result: the inclusion of ID=b, which is good, but also the inclusion of ID=a, who has one row of event=1 -- which is not good.

Attempt 1:

d_test <- subset(d, ID %in% unique(ID[event == 0])) %>% 
  arrange(ID, event_date)

  ID event event_date
1  a     0 2011-01-01
2  a     1 2012-08-21
3  b     0 2011-12-23
4  b     0 2011-12-31

Attempt 2:

setDT(d)
d_test <- d[ID %in% d[,event==0, by=ID][V1==TRUE, ID]]

   ID event event_date
1:  a     0 2011-01-01
2:  a     1 2012-08-21
3:  b     0 2011-12-23
4:  b     0 2011-12-31

Bottom line: I can't express NEVER or NONE in my code.

logjammin
  • 1,121
  • 6
  • 21
  • 2
    `subset(d, ave(as.logical(event), ID, FUN = Negate(any)))` or `subset(d, ave(event, ID, FUN = sum) == 0)`. – Ritchie Sacramento May 30 '22 at 23:32
  • 1
    Think I've answered this before, but `d[, if (all(event == 0)) .SD, by=ID]` . Essentially the same `if` logic here allowing the dropping of groups meeting a criteria - https://stackoverflow.com/questions/18302610/remove-ids-that-occur-x-times-r/18304851 – thelatemail May 30 '22 at 23:40
  • 1
    On further reflection, `all` and `any` may be what you're essentially looking to know about. Negating `any` (`!any(...)` or `Negate(any)(...)`) is then a `none` essentially. So maybe a better duplicate might also be: https://stackoverflow.com/questions/40825037/select-groups-which-have-at-least-one-of-a-certain-value if you make the edit to change `any(...)` to one of the above. – thelatemail May 30 '22 at 23:51
  • 3
    `subset(d, ave(!event, ID, FUN = all))` or even `subset(d, !ave(event, ID, FUN = sum))` – Onyambu May 31 '22 at 00:08
  • Thank you everyone for these answers. They all work on my toy `df` used here; I'm running them now on my real-world dataset to see how they apply to a few million rows and ~30 columns. It seems like my question may be a duplicate; I'm fine if that's the case. If not, please feel free to write up your answers as proper answers and I'll mark them correct. – logjammin May 31 '22 at 00:30
  • 1
    With `dplyr`, `d |> group_by(ID) |> filter(...)` where `...` could be (a) `!any(event == 1)`, (b) `all(event != 1)` (c) `! 1 %in% event`, (d) `sum(event) == 0` (since it's binary), (e) `sum(event == 1) == 0`, (f) simplifying (a) since it's binary `!any(event)`, (g) codegolfing (b) since it's binary `all(!event)` – Gregor Thomas May 31 '22 at 01:12

1 Answers1

2

You can use data.table and any to create a column to indicate if ID has at least one event :

library(data.table)
d <- data.table(ID = c("a","a","b","b", "c","c"),
                event = c(0,1,0,0,1,1),
                event_date = as.Date(c("2011-01-01","2012-08-21","2011-12-23","2011-12-31","2013-03-14","2015-07-12")),
                stringsAsFactors=FALSE)

d[, any.event := any(event == 1), by = "ID"]

   ID event event_date any.event
1:  a     0 2011-01-01      TRUE
2:  a     1 2012-08-21      TRUE
3:  b     0 2011-12-23     FALSE
4:  b     0 2011-12-31     FALSE
5:  c     1 2013-03-14      TRUE
6:  c     1 2015-07-12      TRUE

any checks if the condition is fulfilled for at least one element of event. by applies this function for each ID.

Then you can filter the data.table using column any.event :

d[!(any.event)]

   ID event event_date any.event
1:  b     0 2011-12-23     FALSE
2:  b     0 2011-12-31     FALSE

Coline
  • 136
  • 6