1

This is for using aggregate function in R at two levels. For the below data frame,

df <- data.frame( Event  = c('A1','A1','A1','A1','A1'),
                    Time = c(10,11,21,17,12),
                    Type = c('New','New','Repeat','Repeat','Repeat'))

At one level, the following is working perfectly

aggregate(df$Time,
          by=list(df$Event), 
          FUN = function(x) c(mintime = min(x),endtime = max(x)))

I am looking for the count of each Type with a condition check.

## CODE NOT WORKING
aggregate(df,by=list(df$Event), 
          FUN = function(x) c(mintime = min(df$time),endtime = max(df$time), New=length(df$time=='New'),Repeat=length(df$time=='Repeat')))

so that I will get

Group.1 x.mintime x.maxtime x.New x.Repeat
A1      10        21        2     3
Arun GK
  • 73
  • 10

2 Answers2

2

With data.table:

library(data.table)
setDT(df)[, .(mintime = min(Time), 
              maxtime = max(Time), 
              New = sum(Type=="New"), 
              Repeat = sum(Type=="Repeat")),
          by = Event] 

With sqldf:

library(sqldf)
sqldf("select Event,
              min(Time) as mintime,
              max(Time) as maxtime,
              sum(Type=='New') as New,
              sum(Type=='Repeat') as Repeat
      from df
      group by Event")

Result:

   Event mintime maxtime New Repeat
1:    A1      10      21   2      3
acylam
  • 18,231
  • 5
  • 36
  • 45
1

Using group_by() and summarise() from dplyr make this really easy. See the following:

library(dplyr)

df <- data.frame( Event  = c('A1','A1','A1','A1','A1'),
                  Time = c(10,11,21,17,12),
                  Type = c('New','New','Repeat','Repeat','Repeat'))

df %>% 
  group_by(Event) %>% 
  summarise(mintime  = min(Time),
            endtime  = max(Time),
            x.New    = sum(Type == "New"),
            x.Repeat = sum(Type == "Repeat"))

# A tibble: 1 x 5
   Event mintime endtime x.New x.Repeat
  <fctr>   <dbl>   <dbl> <int>    <int>
1     A1      10      21     2        3
cody_stinson
  • 390
  • 1
  • 3
  • 12