Best method for averaging across rows

Question

I have data with multiple observations per day, and I want to construct a table of daily averages. My instinctive approach (from other programming languages) is to sort the data by date and write a for loop to go through and average it out. But every time I see an R question involving for loops, there tends to be a strong response that R handles vector-type approaches much better. What would a smarter approach be to this problem?

For reference, my data looks something like

date       observation
2017-4-4   17
2017-4-4   412
2017-4-4   9
2017-4-3   96
2017-4-3   14
2017-4-2   8

And I would like the output to be a new data frame that looks like

date       average
2017-4-4   146
2017-4-3   55
2017-4-2   8

Thanks for your help! That question is actually quite different from mine though. It's asking for the average across multiple variables in the same row. For clarification, I want to average a single variable across potentially multiple rows. So the output will be a new data frame with a list of dates and the average of the observations from each date. — muahdeb, Apr 04 '17 at 10:33
`tapply(df$observation, df$date, FUN=mean)` http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega — jogo, Apr 04 '17 at 10:35
Please "search and research" before posting questions like this and show what you have tried so far. This will get your more help. — micstr, Apr 04 '17 at 10:41
@jogo The response in the question you linked was very enlightening, and I can see that the method outputs each day followed by the average. Your links refer to this as being a "ragged" array, and I can't figure out a clean way to extract the date or the mean by itself (in order to, say, plot them). — muahdeb, Apr 04 '17 at 10:48

Codutie · Answer 1 · 2017-04-04T10:44:22.563

2

require("dplyr")
df <- data.frame(date = c('2017-4-4', '2017-4-4', '2017-4-4', '2017-4-3', '2017-4-3', '2017-4-2'),
             observation = c(17, 412, 8, 96, 14, 8))

df %>% 
  group_by(date) %>% 
  summarise(average = mean(observation)) %>%
  data.frame

edited Apr 04 '17 at 10:44

answered Apr 04 '17 at 10:36

Codutie

1,055
13
25

jogo · Accepted Answer · 2017-04-04T11:32:25.480

tapply() can do that:

df <- read.table(header=TRUE, text=
'date       observation
2017-4-4   17
2017-4-4   412
2017-4-4   9
2017-4-3   96
2017-4-3   14
2017-4-2   8')
df$date <- as.Date(df$date, format="%Y-%m-%d")
m <- tapply(df$observation, df$date, FUN=mean)
d.result <- data.frame(date=as.Date(names(m), format="%Y-%m-%d"), m)
# > d.result
#                  date   m
# 2017-04-02 2017-04-02   8
# 2017-04-03 2017-04-03  55
# 2017-04-04 2017-04-04 146

or

aggregate(observation ~ date, data=df, FUN=mean)

or with data.table

library("data.table")

dt <- fread(
'date       observation
2017-4-4   17
2017-4-4   412
2017-4-4   9
2017-4-3   96
2017-4-3   14
2017-4-2   8')
dt[ , .(observation = mean(observation)), by=date]

Best method for averaging across rows

2 Answers2