-2

I have data with multiple observations per day, and I want to construct a table of daily averages. My instinctive approach (from other programming languages) is to sort the data by date and write a for loop to go through and average it out. But every time I see an R question involving for loops, there tends to be a strong response that R handles vector-type approaches much better. What would a smarter approach be to this problem?

For reference, my data looks something like

date       observation
2017-4-4   17
2017-4-4   412
2017-4-4   9
2017-4-3   96
2017-4-3   14
2017-4-2   8

And I would like the output to be a new data frame that looks like

date       average
2017-4-4   146
2017-4-3   55
2017-4-2   8
muahdeb
  • 3
  • 2
  • Thanks for your help! That question is actually quite different from mine though. It's asking for the average across multiple variables in the same row. For clarification, I want to average a single variable across potentially multiple rows. So the output will be a new data frame with a list of dates and the average of the observations from each date. – muahdeb Apr 04 '17 at 10:33
  • `tapply(df$observation, df$date, FUN=mean)` http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega – jogo Apr 04 '17 at 10:35
  • Please "search and research" before posting questions like this and show what you have tried so far. This will get your more help. – micstr Apr 04 '17 at 10:41
  • @jogo The response in the question you linked was very enlightening, and I can see that the method outputs each day followed by the average. Your links refer to this as being a "ragged" array, and I can't figure out a clean way to extract the date or the mean by itself (in order to, say, plot them). – muahdeb Apr 04 '17 at 10:48

2 Answers2

2
require("dplyr")
df <- data.frame(date = c('2017-4-4', '2017-4-4', '2017-4-4', '2017-4-3', '2017-4-3', '2017-4-2'),
             observation = c(17, 412, 8, 96, 14, 8))

df %>% 
  group_by(date) %>% 
  summarise(average = mean(observation)) %>%
  data.frame
Codutie
  • 1,055
  • 13
  • 25
1

tapply() can do that:

df <- read.table(header=TRUE, text=
'date       observation
2017-4-4   17
2017-4-4   412
2017-4-4   9
2017-4-3   96
2017-4-3   14
2017-4-2   8')
df$date <- as.Date(df$date, format="%Y-%m-%d")
m <- tapply(df$observation, df$date, FUN=mean)
d.result <- data.frame(date=as.Date(names(m), format="%Y-%m-%d"), m)
# > d.result
#                  date   m
# 2017-04-02 2017-04-02   8
# 2017-04-03 2017-04-03  55
# 2017-04-04 2017-04-04 146

or

aggregate(observation ~ date, data=df, FUN=mean)

or with data.table

library("data.table")

dt <- fread(
'date       observation
2017-4-4   17
2017-4-4   412
2017-4-4   9
2017-4-3   96
2017-4-3   14
2017-4-2   8')
dt[ , .(observation = mean(observation)), by=date]
jogo
  • 12,469
  • 11
  • 37
  • 42