6

I'm having trouble when trying to calculate the average temperature by hour.

I have a data frame with date, time (hh:mm:ss p.m./a.m.)and temperature. What I need is to extract the mean temperature by hour in order to plot daily variation of temperature.

I'm new to R, but did a try with what I know: I first tried by transforming hours into numbers, then extracting the first two characters, and then to calculate the mean but it didn't work very well. Moreover I have so many files to analize that it would be much better to have something more automated and clean than the "solution" I found.

I believe it must be a better way to calculate averages by hours in R so I've been looking for the answer in other posts here. Unfortunately I couldn't find a clear answer regarding extracting statistics from time data.

My data looks like this

          date     hour temperature
1   28/12/2013 13:03:01      41.572
2   28/12/2013 13:08:01      46.059
3   28/12/2013 13:13:01       48.55
4   28/12/2013 13:18:01      49.546
5   28/12/2013 13:23:01      49.546
6   28/12/2013 13:28:01      49.546
7   28/12/2013 13:33:01      50.044
8   28/12/2013 13:38:01      50.542
9   28/12/2013 13:43:01      50.542
10  28/12/2013 13:48:01       51.04
11  28/12/2013 13:53:01      51.538
12  28/12/2013 13:58:01      51.538
13  28/12/2013 14:03:01      50.542
14  28/12/2013 14:08:01       51.04
15  28/12/2013 14:13:01       51.04
16  28/12/2013 14:18:01      52.534
17  28/12/2013 14:23:01      53.031
18  28/12/2013 14:28:01      53.031
19  28/12/2013 14:33:01      53.031
20  28/12/2013 14:38:01      51.538
21  28/12/2013 14:43:01      53.031
22  28/12/2013 14:48:01      53.529
etc (24hs data)

And I would like R to calculate average per hour (without taking into account differences in minutes or seconds, just by hour)

Any suggestion? Thank you very much in advance!

Regards, Maria

Mark Rajcok
  • 362,217
  • 114
  • 495
  • 492
Maria
  • 233
  • 3
  • 4
  • 8

3 Answers3

14

Combine the date and hour columns into a POSIXct column and cut() by hourly breaks:

df <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
date hour temperature
28/12/2013 13:03:01  41.572
28/12/2013 13:08:01  46.059
28/12/2013 13:13:01  48.55
28/12/2013 13:18:01  49.546
28/12/2013 13:23:01  49.546
28/12/2013 13:28:01  49.546
28/12/2013 13:33:01  50.044
28/12/2013 13:38:01  50.542
28/12/2013 13:43:01  50.542
28/12/2013 13:48:01  51.04
28/12/2013 13:53:01  51.538
28/12/2013 13:58:01  51.538
28/12/2013 14:03:01  50.542
28/12/2013 14:08:01  51.04
28/12/2013 14:13:01  51.04
28/12/2013 14:18:01  52.534
28/12/2013 14:23:01  53.031
28/12/2013 14:28:01  53.031
28/12/2013 14:33:01  53.031
28/12/2013 14:38:01  51.538
28/12/2013 14:43:01  53.031
28/12/2013 14:48:01  53.529
28/12/2013 15:01:01  50.77")

df$datehour <- cut(as.POSIXct(paste(df$date, df$hour),
   format="%d/%m/%Y %H:%M:%S"), breaks="hour") 
head(df)
        date     hour temperature            datehour
1 28/12/2013 13:03:01      41.572 2013-12-28 13:00:00
2 28/12/2013 13:08:01      46.059 2013-12-28 13:00:00
3 28/12/2013 13:13:01      48.550 2013-12-28 13:00:00
4 28/12/2013 13:18:01      49.546 2013-12-28 13:00:00
5 28/12/2013 13:23:01      49.546 2013-12-28 13:00:00
6 28/12/2013 13:28:01      49.546 2013-12-28 13:00:00

Now aggregate by that hourly column:

means <- aggregate(temperature ~ datehour, df, mean)
head(means)
             datehour temperature
1 2013-12-28 13:00:00    49.17192
2 2013-12-28 14:00:00    52.23470
3 2013-12-28 15:00:00    50.77000

plot(as.POSIXct(means$datehour), means$temperature, type="l", las=1,
     main="Hourly Avg Temperatures", xlab="Hour", ylab="")

But, for time series data, I like to use package xts:

require(xts)
df.xts <- xts(df$temperature, as.POSIXct(paste(df$date, df$hour),
      format="%d/%m/%Y %H:%M:%S"))
head(df.xts)
                      [,1]
2013-12-28 13:03:01 41.572
2013-12-28 13:08:01 46.059
2013-12-28 13:13:01 48.550
2013-12-28 13:18:01 49.546
2013-12-28 13:23:01 49.546
2013-12-28 13:28:01 49.546

means <- period.apply(df.xts, endpoints(df.xts, "hours"), mean)
head(means)
                        [,1]
2013-12-28 13:58:01 49.17192
2013-12-28 14:48:01 52.23470
2013-12-28 15:01:01 50.77000

Notice how the timestamps are the last entry of each hour. We can align the timestamps (down) to the beginning of the hour with this function:

align.time.down = function(x,n){ index(x) = index(x)-n; align.time(x,n) }
means.rounded <- align.time.down(means, 60*60)  
         # 2nd argument is the number of seconds to adjust/round to,
         # just like function align.time()

head(means.rounded)
                        [,1]
2013-12-28 13:00:00 49.17192
2013-12-28 14:00:00 52.23470
2013-12-28 15:00:00 50.77000

plot(means.rounded, las=1, main="Hourly Avg Temperatures")

enter image description here

Mark Rajcok
  • 362,217
  • 114
  • 495
  • 492
  • When I doing the aggregate by that hourly column, I face the error message ( Error in get(as.character(FUN), mode = "function", envir = envir) : object 'FUN' of mode 'function' was not found ). How can I solve it? – milawithict May 18 '18 at 12:22
3

It would always easier if sample data and expected output is given in the question.

Solution with Data.table package

require(data.table)
data <- fread('temp.csv',sep=',') #Assuming your data is in temp.csv
#if above step not executed, convert the data frame to data.table 
data <- data.table(data)
> str(data)
Classes ‘data.table’ and 'data.frame':  12 obs. of  3 variables:
$ date       : chr  "28/12/2013" "28/12/2013" "28/12/2013" "28/12/2013" ...
$ hour       : chr  "13:03:01" "13:08:01" "13:13:01" "13:18:01" ...
$ temperature: num  41.6 46.1 48.5 49.5 49.5 ...

> data
      date     hour    temperature      avg
1: 27/12/2013 13:00:00       42.99 35.78455
2: 27/12/2013 14:00:00       65.97 35.78455
3: 27/12/2013 15:00:00       63.57 35.78455 

  data[,list(avg=mean(temperature)),by=hour] #dataset is sorted by hour
    hour   avg
1: 13:00:00 42.99
2: 14:00:00 65.97
3: 15:00:00 63.57
  data[,list(avg=mean(temperature)),by="date,hour"] #data set is grouped by date,then hour
        date     hour   avg
1: 27/12/2013 13:00:00 42.99
2: 27/12/2013 14:00:00 65.97
3: 27/12/2013 15:00:00 63.57

data[,list(avg=mean(temperature)),by=list(date,hour(as.POSIXct(data$hour, format = "%H:%M:%S")))] # to group by hour only 
     date     hour    avg
1: 27/12/2013    1 29.530
2: 27/12/2013    4 65.970
  • I'm sorry for not being more specific. I'm learnign how to post my questions too. Thank you for your suggestions. I'll edit my question according to your answer. – Maria Jul 09 '14 at 05:24
  • I'm still confused with your question. Does my answer solve your problem? You only want to group the hour( exclude minute and second ) – on_the_shores_of_linux_sea Jul 09 '14 at 05:32
  • 1
    Exactly,I tried your script but I got this error message Error in `[.data.frame`(data, , list(avg = mean(temperature)), by = hour) : unused argument (by = hour) I'm looking more deeply to find out why I'm getting that error message – Maria Jul 09 '14 at 05:41
  • 1
    You are getting that error because, its a data frame. I assume u did not use fread to read the data, in that case, data <- data.table(data) – on_the_shores_of_linux_sea Jul 09 '14 at 05:44
  • You were right! I did as you told but I do not trust those values. I got, for instance, for 13:00hs an average temperature of 65 degrees which it can't be possible given the sample values (ranged from 41.572 to 51.538 degrees). Did you got the same results? weird. – Maria Jul 09 '14 at 05:55
  • If you group only by hour, for example 13:xx:xx it will return mean of all 13:00 for all days. From the example data mentioned above I'm getting 49.17192 as avg for 13 – on_the_shores_of_linux_sea Jul 09 '14 at 06:06
  • I have two days per table (from 13h of one day to 12:59 of the next one) I grouped by hour and date, data[,list(avg=mean(temperature)),by="date,hour"] and by hour only : data[,list(avg=mean(temperature)),by=list(date,hour(as.POSIXct(data$hour, format = "%H:%M:%S")))] But I'm still getting the same "weird averages" as I can see on the example of your answer (avg around 65 degrees which don't make any sense). I never used this kind of functions before, I'm so sorry to bother but I'm stuck in that step. – Maria Jul 09 '14 at 06:17
  • data[,list(avg=mean(temperature),sum = sum(temperature), len = length(temperature), max = max(temperature), min = min(temperature)),by=list(date,hour(as.POSIXct(data$hour, format = "%H:%M:%S")))]. Here we are finding sum, length,max and min. You can cross check your answer and see if any thing is wrong. From the sample data I dont find any errors – on_the_shores_of_linux_sea Jul 09 '14 at 06:26
  • Another error message. Error in Summary.factor(c(56L, 63L, 64L, 65L, 65L, 65L, 66L, 67L, 67L, : sum not meaningful for factors In addition: Warning message: In `[.data.table`(data, , list(avg = mean(temperature), sum = sum(temperature), : argument is not numeric or logical: returning NA Wow, this is getting more complicated. I checked my data and looks fine. I then ran your script from top to tail, step by step, but I'm not even close to get the real averages. And the worst is that I cannot understand why. – Maria Jul 09 '14 at 06:38
  • do str(data), it will show types of each fields. check if temperature is numeric. – on_the_shores_of_linux_sea Jul 09 '14 at 06:45
  • Classes ‘data.table’ and 'data.frame': 12 obs. of 3 variables: $ date : chr "28/12/2013" "28/12/2013" "28/12/2013" "28/12/2013" ... $ hour : chr "13:03:01" "13:08:01" "13:13:01" "13:18:01" ... $ temperature: num 41.6 46.1 48.5 49.5 49.5 ... – on_the_shores_of_linux_sea Jul 09 '14 at 06:47
  • I have updated the answer with str, check if your dataset has same return – on_the_shores_of_linux_sea Jul 09 '14 at 06:49
  • It wasn't! it was a factor...I'll change it to numeric and I'll come back in a couple of minutes to tell you how did it go – Maria Jul 09 '14 at 07:10
  • Same result. Weird averages, I'll edit my answer so you can see in more detail the output that I'm getting. Again, I'm deeply sorry for taking so much of your time. – Maria Jul 09 '14 at 07:20
  • I'm so ashamed. It was that...and that I made a mistake while setting the sep and dec while opening my data, this is so embarrasing. I'm so sorry, I promess to check this kind of things before asking any question. After that, and converting temperature from "factor" into "numeric" Your script worked like a charm. I believe other people with the same question as me will find this EXTREMELY HELPFUL. Thank you so much for your clear explanation, your willness and patience! – Maria Jul 09 '14 at 07:44
0

Try this example:

library(foqat)
met2=avri(met[,c(1,2)], bkip="1 hour", mode="ncycle", value=24)

#plot it
geom_avri(
    met2,
    cave=2, csd=3,
    alpha=0.5, lcc="#0050b3", rff="#40a9ff", 
    xlab="Hour of day", ylab=bquote(Temp~" "~(degree*C))
)

enter image description here

RiveN
  • 2,595
  • 11
  • 13
  • 26
TichPi
  • 146
  • 5