2

I have a dataset as follows

Pt    EVENT
123    GGG
123    Nor
123    tre
144    GGG
1667   tre
1667   Nor
1667   tre

I want to group_by and then the data as per my previous question according to Pt but I am only interested in creating groups which contain Nor so the final dataset should be

Pt    
123   GGG      Nor   tre
1667  tre      Nor   tre

I have the answer from the previous question but I can't seem to only select specific groups. This is what I have tried but it gives me

Sankey<-EndoSubset %>%
  group_by(Pt) %>% 
  filter(EVENT == "Nor")

but obviously this just gives me rows containing Nor whereas I want all the rows for Pt that contains Nor

Community
  • 1
  • 1
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125

3 Answers3

4

We can try with data.table. Convert the 'data.frame' to 'data.table' (setDT(df1), reshape from 'long' to 'wide' with dcast, then we check for columns 2:4 for "Nor" value, compare elementwise with Reduce and use the logical vector to subset the rows.

library(data.table)
res <- dcast(setDT(df1), Pt~rowid(Pt), value.var="EVENT", fill="")
res[res[, Reduce(`|`, lapply(.SD,`==`, "Nor" )) , .SDcols = 2:4]]
#     Pt   1   2   3
#1:  123 GGG Nor tre
#2: 1667 tre Nor tre

A compact option would be to subset the "Nor" rows before the dcast step

dcast(setDT(df1)[, if(any(EVENT=="Nor")) .SD, Pt], Pt~rowid(Pt), value.var ="EVENT")
#    Pt   1   2   3
#1:  123 GGG Nor tre
#2: 1667 tre Nor tre

If we are using dplyr, similar method as showed in the second option in data.table would be

library(dplyr)
library(tidyr)
df1 %>%
   group_by(Pt) %>%
   filter(any(EVENT=="Nor")) %>% 
   mutate(n = row_number())  %>%
   ungroup() %>%
   spread(n, EVENT)
#    Pt     1     2     3
#  <int> <chr> <chr> <chr>
#1   123   GGG   Nor   tre
#2  1667   tre   Nor   tre

Or using only base R methods

reshape(transform(df1[with(df1, ave(EVENT=="Nor", Pt, FUN=any)),],
   time = ave(seq_along(EVENT), Pt, FUN = seq_along)), direction="wide", idvar="Pt")
#    Pt EVENT.1 EVENT.2 EVENT.3
#1  123     GGG     Nor     tre
#5 1667     tre     Nor     tre
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Depending on how your data looks, it might make sense to store the result as a list. Here is a base R a method to achieve that:

# split data by EVENT and check EVENT is in desired set 
# if yes, collect Pt into character vectors
myList <- lapply(split(df, df$Pt), function(i) {
                                  if("Nor" %in% as.character(i$EVENT)) {
                                    unlist(as.character(i$EVENT))}})
# drop list elements with length 0
myList <- myList[lengths(myList) > 0]

This returns a named list:

myList
$`123`
[1] "GGG" "Nor" "tre"

$`1667`
[1] "tre" "Nor" "tre"

If you want a data.frame and the lengths of the vectors are equal, as in the example, you could use rbind and do.call:

data.frame(do.call(rbind,temp))
      X1  X2  X3
123  GGG Nor tre
1667 tre Nor tre

Note that Pt is now contained in the rownames.

Thanks to @RichardScriven for reminding my about lengths.

lmo
  • 37,904
  • 9
  • 56
  • 69
0

Another option using base R

summary_df <- aggregate(data=df,EVENT~.,FUN = paste,collapse=",")
summary_df[grep("Nor",summary_df$EVENT),]
user2100721
  • 3,557
  • 2
  • 20
  • 29