There are several different ways to do this. Depending upon your ultimate goals, different approaches offer different advantages.
Here's a comparison of three approaches, using a reproducible result with dummy data:
## Create data
d <- data.frame(CARRIER = as.factor(c("a", "b", "a", "c", "b", "a", "c")),
DEP_DELAY = as.factor(c("Y", "N", "N", NA, "Y", "N" , "Y")),
ARR_DELAY = as.factor(c("N", "N", "Y", "N", "Y", "N", "Y")),
CANCELLED = as.factor(c("N", "N", "N", "N", NA, "Y", "Y")))
1) The aggregate
function in base R is the perhaps the simplest way to get what you want and I would recommend using it if this is all you want to do:
aggregate(DEP_DELAY ~ CARRIER, d, summary)
# CARRIER DEP_DELAY.N DEP_DELAY.Y
# 1 a 2 1
# 2 b 1 1
# 3 c 0 1
2) The plyr
package uses a different syntax than base R, but is very powerful. It was written by the Hadley Wickham who wrote the ggplot2
plotting package. The befits of plyr
would be its powerful syntax (base R can become clunky when you start to do complicated summaries) and usefulness in manipulating data for ggplot2
(because Wickham wrote both of them and they compliment each other nicely).
library(plyr) # you will need to install this package
ddply(d, .(CARRIER, DEP_DELAY), summary)
# CARRIER DEP_DELAY ARR_DELAY CANCELLED
#1 a:2 N:2 N:1 N:1
#2 b:0 Y:0 Y:1 Y:1
#3 c:0 <NA> <NA> <NA>
#4 a:1 N:0 N:1 N:1
# I clipped the output to save space
3) The data.tables
package uses a third syntax. Likeplyr
, it is a powerful library that has its own syntax. Its befit over plyr
is that it can handle much larger data sets due to different memory usage in the code.
library(data.table) # You'll also need to install this package
DT = data.table(d) # Convert data.frame to data.table
DT[,summary(DEP_DELAY), by = CARRIER]
# CARRIER V1
#1: a 2
#2: a 1
#3: b 1
#4: b 1
#5: c 0
#6: c 1
#7: c 1
If you're just learning R, I would suggest method 1. If you use R
more, I suggest learning both because each can be advantageous to have in your toolbox. If you're using larger data sets (~100s MB or larger), I would learn data.table
first. If you're wanting to learn ggplot2
, I would learn `plyr' first.