How to group_by only if the group contains a certain value in a specific column

Question

I have a dataset as follows

Pt    EVENT
123    GGG
123    Nor
123    tre
144    GGG
1667   tre
1667   Nor
1667   tre

I want to group_by and then the data as per my previous question according to Pt but I am only interested in creating groups which contain Nor so the final dataset should be

Pt    
123   GGG      Nor   tre
1667  tre      Nor   tre

I have the answer from the previous question but I can't seem to only select specific groups. This is what I have tried but it gives me

Sankey<-EndoSubset %>%
  group_by(Pt) %>% 
  filter(EVENT == "Nor")

but obviously this just gives me rows containing Nor whereas I want all the rows for Pt that contains Nor

Isn't this the same as previous output. I am not finding any difference in the expected output — akrun, Jul 10 '16 at 19:54

akrun · Answer 1 · 2016-07-10T20:41:34.413

We can try with data.table. Convert the 'data.frame' to 'data.table' (setDT(df1), reshape from 'long' to 'wide' with dcast, then we check for columns 2:4 for "Nor" value, compare elementwise with Reduce and use the logical vector to subset the rows.

library(data.table)
res <- dcast(setDT(df1), Pt~rowid(Pt), value.var="EVENT", fill="")
res[res[, Reduce(`|`, lapply(.SD,`==`, "Nor" )) , .SDcols = 2:4]]
#     Pt   1   2   3
#1:  123 GGG Nor tre
#2: 1667 tre Nor tre

A compact option would be to subset the "Nor" rows before the dcast step

dcast(setDT(df1)[, if(any(EVENT=="Nor")) .SD, Pt], Pt~rowid(Pt), value.var ="EVENT")
#    Pt   1   2   3
#1:  123 GGG Nor tre
#2: 1667 tre Nor tre

If we are using dplyr, similar method as showed in the second option in data.table would be

library(dplyr)
library(tidyr)
df1 %>%
   group_by(Pt) %>%
   filter(any(EVENT=="Nor")) %>% 
   mutate(n = row_number())  %>%
   ungroup() %>%
   spread(n, EVENT)
#    Pt     1     2     3
#  <int> <chr> <chr> <chr>
#1   123   GGG   Nor   tre
#2  1667   tre   Nor   tre

Or using only base R methods

reshape(transform(df1[with(df1, ave(EVENT=="Nor", Pt, FUN=any)),],
   time = ave(seq_along(EVENT), Pt, FUN = seq_along)), direction="wide", idvar="Pt")
#    Pt EVENT.1 EVENT.2 EVENT.3
#1  123     GGG     Nor     tre
#5 1667     tre     Nor     tre

@SebastianZeki That should be the case with `grep`. But, here I guess you can use `%in%` and `c("Nor", "GGG")` — akrun, Jul 10 '16 at 20:10

lmo · Answer 2 · 2016-07-11T12:05:41.167

Depending on how your data looks, it might make sense to store the result as a list. Here is a base R a method to achieve that:

# split data by EVENT and check EVENT is in desired set 
# if yes, collect Pt into character vectors
myList <- lapply(split(df, df$Pt), function(i) {
                                  if("Nor" %in% as.character(i$EVENT)) {
                                    unlist(as.character(i$EVENT))}})
# drop list elements with length 0
myList <- myList[lengths(myList) > 0]

This returns a named list:

myList
$`123`
[1] "GGG" "Nor" "tre"

$`1667`
[1] "tre" "Nor" "tre"

If you want a data.frame and the lengths of the vectors are equal, as in the example, you could use rbind and do.call:

data.frame(do.call(rbind,temp))
      X1  X2  X3
123  GGG Nor tre
1667 tre Nor tre

Note that Pt is now contained in the rownames.

Thanks to @RichardScriven for reminding my about lengths.

`sapply(myList, length)` is the same as `lengths(myList)` in newer R versions, only slower. — Rich Scriven, Jul 10 '16 at 20:40

score 0 · Answer 3 · answered Jul 11 '16 at 08:54

0

Another option using base R

summary_df <- aggregate(data=df,EVENT~.,FUN = paste,collapse=",")
summary_df[grep("Nor",summary_df$EVENT),]

answered Jul 11 '16 at 08:54

user2100721

3,557
2
20
29

How to group_by only if the group contains a certain value in a specific column

3 Answers3