0

I have not found anything remotely similar on SO (or elsewhere) and am therefore hoping for your help. I am not yet very familiar with finding vectorised approaches and my initial attempt feels quite clumsy.

I currently have a data frame similar to this:

df <- data.frame(c(1,1,1,2,2,2,3,3,3),c(TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE))
colnames(df) <- c("ID", "Status")

I would now like to simplify my observations, showing TRUE only if every single status for the particular ID is given as TRUE, i.e. a final table like

ID    Status
1     FALSE
2     FALSE
3     TRUE

I have managed to do it in a loop (again, even for a loop it might be quite clumsy):

NrID <- df$ID[!duplicated(df$ID)]

for (i in NrID) {
  x <- sum(df$Status[df$ID == i])
  ifelse (x < max(NrID), df$Status[df$ID == i] <- FALSE, df$Status[df$ID == i] <- TRUE)
}

finaldf <- df[!duplicated(df$ID), ]

I would appreciate on advice or functions how to vectorise this approach since my final dataset is quite large and I would just appreciate a cleaner code.

Thanks in advance!

WillyWonka
  • 107
  • 1
  • 11
  • 1
    Possible duplicate of [Aggregate / summarize multiple variables per group (e.g. sum, mean)](https://stackoverflow.com/questions/9723208/aggregate-summarize-multiple-variables-per-group-e-g-sum-mean) – markus Feb 21 '19 at 09:27

2 Answers2

2

A dplyr solution can be:

df %>%
 group_by(ID) %>%
 summarise(Status = all(Status))

     ID Status
  <dbl> <lgl> 
1    1. FALSE 
2    2. FALSE 
3    3. TRUE 

Or with base R:

aggregate(df$Status, list(df$ID), function(x) all(x))

  Group.1     x
1       1 FALSE
2       2 FALSE
3       3  TRUE
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • Thanks for the quick and easy answer. I will mark it as correct as soon as SO allows (10 minutes). One question though: If I have a third variable, is there an easy solution to keep that as well? The third variable has the same value for each ID. – WillyWonka Feb 21 '19 at 09:27
  • 1
    With `dplyr` you can do `df %>% group_by(ID) %>% summarise(Status = all(Status), third_variable = first(third_variable))`. – tmfmnk Feb 21 '19 at 09:36
1

If speed and concision is what you are after you might like data.table:

Setup:

library(data.table)
setDT(df) # Convert to data.table

Calculations:

df[, .(Status = all(Status)), by = ID]

#    ID Status
# 1:  1  FALSE
# 2:  2  FALSE
# 3:  3   TRUE
s_baldur
  • 29,441
  • 4
  • 36
  • 69