5

Following up my previous question about aggregating hourly data into daily data, I want to continue with (a) monthly aggregate and (b) merging the monthly aggregate into the original dataframe.

My original dataframe looks like this:

Lines <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"

The daily aggregates have been answered in my previous question, and then I can find my way to produce the monthly aggregates from there, to something like this:

Lines <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"

Where the OutdoorAVE is the monthly average of the daily minimum and maximum outdoor temperature. What I want to have in the end is something like this:

Lines <- "Date,Outdoor,Indoor,Month,OutdoorAVE
01/01/2000 01:00,30,25,Jan,31.33
01/01/2000 02:00,31,26,Jan,31.33
01/01/2000 03:00,33,24,Jan,31.33
02/01/2000 01:00,29,25,Feb,31.67
02/01/2000 02:00,27,26,Feb,31.67
02/01/2000 03:00,39,24,Feb,31.67
12/01/2000 02:00,27,26,Dec,31.33
12/01/2000 03:00,39,24,Dec,31.33
12/31/2000 23:00,28,25,Dec,31.33"

I do not know enough R on how to do that. Any help is greatly appreciated.

Community
  • 1
  • 1
ery
  • 992
  • 3
  • 14
  • 25

3 Answers3

2

Try ave and eg POSIXlt to extract the month:

zz <- textConnection(Lines)
Data <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)

Data$Month <- strftime(
     as.POSIXlt(Data$Date,format="%m/%d/%Y %H:%M"),
     format='%b')
Data$outdoor_ave <- ave(Data$Outdoor,Data$Month,FUN=mean)

Gives :

> Data
              Date Outdoor Indoor Month outdoor_ave
1 01/01/2000 01:00      30     25   Jan    31.33333
2 01/01/2000 02:00      31     26   Jan    31.33333
3 01/01/2000 03:00      33     24   Jan    31.33333
4 02/01/2000 01:00      29     25   Feb    31.66667
5 02/01/2000 02:00      27     26   Feb    31.66667
6 02/01/2000 03:00      39     24   Feb    31.66667
7 12/01/2000 02:00      27     26   Dec    31.33333
8 12/01/2000 03:00      39     24   Dec    31.33333
9 12/31/2000 23:00      28     25   Dec    31.33333

Edit : Then just calcualte Month in Data as shown above and use merge :

zz <- textConnection(Lines2) # Lines2 is the aggregated data
Data2 <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)

> merge(Data,Data2[-1],all=T)
  Month             Date Outdoor Indoor OutdoorAVE
1   Dec 12/01/2000 02:00      27     26      31.33
2   Dec 12/01/2000 03:00      39     24      31.33
3   Dec 12/31/2000 23:00      28     25      31.33
4   Feb 02/01/2000 01:00      29     25      31.67
5   Feb 02/01/2000 02:00      27     26      31.67
6   Feb 02/01/2000 03:00      39     24      31.67
7   Jan 01/01/2000 01:00      30     25      31.33
8   Jan 01/01/2000 02:00      31     26      31.33
9   Jan 01/01/2000 03:00      33     24      31.33
Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • @Joris Meys: My problem is that the outdoorAVE is on another dataframe (say Data.Month) that has only the month and the average columns, and there are only 12 rows of it for a year. The outdoorAVE is not calculated with as you write above, but from the aggregates (daily then monthly) of annual hourly data. So what I want is to add a column in the original dataframe (i.e. Data in the example above) where the column is taken from another data set (e.g Data.Monthly). – ery Mar 28 '11 at 22:52
  • @ery: In your comment, you say `Data.Monthly` only has 2 columns (month and average), but it has 3 columns in your original question. – Joshua Ulrich Mar 28 '11 at 23:02
  • @Joshua: Oops yes, it should be 3, although I am only interested in pasting the outdoor column back to the original data frame. – ery Mar 28 '11 at 23:15
  • @Joris: Yes this works, thank you. My next problem is apparently my timestamp for hour is not 00-23, but 01-24. So I have an NA entry every 24th row. Any good solution for this? – ery Mar 29 '11 at 17:41
  • @ery : just run `Data$Date <- gsub("24:00","00:00",Data$Date)` before you calculate the month. – Joris Meys Mar 30 '11 at 08:22
  • @Joris: Nope, this is not correct. I believe the above code simply changes 24:00 to 00:00, which is not enough. We need to move the date by one day forward. 24:00 is 00:00 for the following day. So far I will just ignore the NA's, especially if I can increase the time resolution. But in many cases I cannot ignore this. – ery Mar 30 '11 at 20:21
  • @ery : true. Post it as a seperate question, the code cannot simply be pasted in a comment. – Joris Meys Mar 30 '11 at 22:16
2

This is tangential to your question, but you may want to use RSQLite and a separate tables for various aggregate values instead, and join the tables with simple SQL commands. If you use many kinds of aggregations your data frame can easily get large and ugly.

GaBorgulya
  • 617
  • 1
  • 5
  • 11
  • This is one good solution, but I do not even know how to import the date/time in SQLite, much less to group it by month or day. Any help? – ery Mar 29 '11 at 21:36
  • 1
    Your whole problem can be solved in SQL. I don't say you should use SQL for the aggregation, but it's useful to know how it works (rather easy). I suggest that you do a bit of reading and ask your questions in separate threads. – GaBorgulya Mar 30 '11 at 00:32
0

Here's a zoo/xts solution. Note that Month is numeric here because you can't mix types in zoo/xts objects.

require(xts) # loads zoo too
Lines1 <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"
con <- textConnection(Lines1)
z <- read.zoo(con, header=TRUE, sep=",",
    format="%m/%d/%Y %H:%M", FUN=as.POSIXct)
close(con)

zz <- merge(z, Month=.indexmon(z),
    OutdoorAVE=ave(z[,1], .indexmon(z), FUN=mean))
zz
#                     Outdoor Indoor Month OutdoorAVE
# 2000-01-01 01:00:00      30     25     0   31.33333
# 2000-01-01 02:00:00      31     26     0   31.33333
# 2000-01-01 03:00:00      33     24     0   31.33333
# 2000-02-01 01:00:00      29     25     1   31.66667
# 2000-02-01 02:00:00      27     26     1   31.66667
# 2000-02-01 03:00:00      39     24     1   31.66667
# 2000-12-01 02:00:00      27     26    11   31.33333
# 2000-12-01 03:00:00      39     24    11   31.33333
# 2000-12-31 23:00:00      28     25    11   31.33333

Update: How do get the above result using two different data sets.

Lines2 <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"
con <- textConnection(Lines2)
z2 <- read.zoo(con, header=TRUE, sep=",", format="%m/%d/%Y",
    FUN=as.POSIXct, colClasses=c("character","NULL","numeric"))
close(con)

zz2 <- na.locf(merge(z1, Month=.indexmon(z1), OutdoorAVE=z2))[index(z1)]
# same output as zz (above)
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • @ery: please see my edit. I'm not sure why Month would always be 11... perhaps your example data and actual data differ? – Joshua Ulrich Mar 29 '11 at 13:36