Average column in daily information at every n-th row

Question

I am very new on R. I have daily observations of temperature and PP for 12-year period (6574 row, 6col, some NA ). I want to calculate, for example, the average from 1st to 10thday of January-2001, then 11-20 and finally 21 to 31 and so on for every month until december for each year in the period I mentioned before.

But also I have problems because February sometimes has 28 or 29 days (leap years).

This is how i open my file is a CSV, with read.table

# READ CSV
setwd ("C:\\Users\\GVASQUEZ\\Documents\\ESTUDIO_PAMPAS\\R_sheet")

huancavelica<-read.table("huancavelica.csv",header = TRUE, sep = ",",
                         dec = ".", fileEncoding = "latin1", nrows = 6574 )

This is the output of my CSV file

     Año Mes Dia PT101 TM102 TM103    
1   1998  1   1   6.0  15.6   3.4
2   1998  1   2   8.0  14.4   3.2
3   1998  1   3   8.6  13.8   4.4
4   1998  1   4   5.6  14.6   4.6
5   1998  1   5   0.4  17.4   3.6
6   1998  1   6   3.4  17.4   4.4
7   1998  1   7   9.2  14.6   3.2
8   1998  1   8   2.2  16.8   2.8
9   1998  1   9   8.6  18.4   4.4
10  1998  1  10   6.2  15.0   3.6 
 .   .    .   .    .     .     .

Welcome to stackOverflow. Rather than posting pictures of your code, people appreciate it if you put your code as text rather than as an image. This makes it much easier to examine. — lmo, Apr 22 '16 at 16:06
I guess an easy way would be to create a new column that is 1 for days 1 to 10, then 2 for 11 to 20, and 3 for > 20. Call the column `x`, then try something like `aggregate(TM102 ~ Mes + x, data = huancavelica, mean)`. There are probably better ways, but this one is kind of straightforward. See also `?aggregate` or questions like [this one](http://stackoverflow.com/questions/21982987/mean-per-group-in-a-data-frame). — slamballais, Apr 22 '16 at 16:53

score 1 · Answer 1 · answered Apr 22 '16 at 16:08

1

We can try

library(data.table)
setDT(df1)[, Grp := (Dia - 1)%/%10+1, by = .(Ano, Mes)
       ][Grp>3, Grp := 3][,lapply(.SD, mean, na.rm=TRUE), by = .(Ano, Mes, Grp)]

answered Apr 22 '16 at 16:08

akrun

874,273
37
540
662

lmo · Accepted Answer · 2016-04-23T11:28:41.673

With the data setup that you have a fairly tried and true method should work:

# add 0 in front of single digit month variable to account for 1 and 10 sorting
huancavelica$MesChar <- ifelse(nchar(huancavelica$Mes)==1, 
                    paste0("0",huancavelica$Mes), as.character(huancavelica$Mes))

# get time of month ID
huancavelica$timeMonth <- ifelse(huancavelica$Dia < 11, 1,   
                          ifelse(huancavelica$Dia > 20, 3, 2)
# get final ID
huancavelica$ID <- paste(huancavelica$Año, huancavelica$MesChar, huancavelica$timeMonth, sep=".")
# average stat
huancavelica$myStat <- ave(huancavelica$PT101, huancavelica$ID, FUN=mean, na.rm=T)

Thanks a lot. it is working. one last question, if i want then the sum instead of the average, there is another function with "Group sum over level combinations of factors" that i can use. — Guisseppe, Apr 22 '16 at 17:43

score 0 · Answer 3 · answered Apr 22 '16 at 17:13

It adds a bit more complexity, but you could cut each month into thirds and get the average for each third. For example:

library(dplyr)
library(lubridate)

# Fake data
set.seed(10)
df = data.frame(date=seq(as.Date("2015-01-01"), as.Date("2015-12-31"), by="1 day"), 
                value=rnorm(365))

# Cut months into thirds
df = df %>% 
  mutate(mon_yr = paste0(month(date, label=TRUE, abbr=TRUE) , " ", year(date))) %>%
  group_by(mon_yr) %>%
  mutate(cutMonth = cut(day(date), 
                        breaks=c(0, round(1/3*n()), round(2/3*n()), n()),
                        labels=c("1st third","2nd third","3rd third")),
         cutMonth = paste0(mon_yr, ", ", cutMonth)) %>%
  ungroup %>%
  mutate(cutMonth = factor(cutMonth, levels=unique(cutMonth)))

          date       value            cutMonth
  1 2015-01-01  0.01874617 Jan 2015, 1st third
  2 2015-01-02 -0.18425254 Jan 2015, 1st third
  3 2015-01-03 -1.37133055 Jan 2015, 1st third
...
363 2015-12-29  -1.3996571 Dec 2015, 3rd third
364 2015-12-30  -1.2877952 Dec 2015, 3rd third
365 2015-12-31  -0.9684155 Dec 2015, 3rd third

# Summarise to get average value for each 1/3 of a month  
df.summary = df %>%  
  group_by(cutMonth) %>%
  summarise(average.value = mean(value))

              cutMonth average.value
1  Jan 2015, 1st third   -0.49065685
2  Jan 2015, 2nd third    0.28178222
3  Jan 2015, 3rd third   -1.03870698
4  Feb 2015, 1st third   -0.45700203
5  Feb 2015, 2nd third   -0.07577199
6  Feb 2015, 3rd third    0.33860882
7  Mar 2015, 1st third    0.12067388
...

Average column in daily information at every n-th row

3 Answers3