0

I have an excel file (.xlsx) that contains three columns, with headers:'datetime'(examples: 10/1/2008 0:10, 10/1/2008 0:20 etc), 'RH'(Example: 0.46) and 'wind_mps'(Example: 3.71). I wish to convert the 10 minute interval data into hourly average data for both RH and wind_mps columns.

I was unable to insert the Excel data into this question. Sorry about that. I can edit my question if someone tells me how to.

A similar question has been answered at how to convert by the minute data to hourly average data in R, but I'm new to R and unable to use the same technique for my data. I also tried to use the 'zoo', 'chron' and 'xts' packages to do this, as in http://rpubs.com/hrbrmstr/time-series-machinations but they seem to not work in R 3.02.

I tried to do this in Excel, but couldn't find a reasonably easy technique.

I was able to achieve a similar task of converting hourly data to daily average for another data set using an Excel macro, but I'm unable to do this for 10 minute data. The Macro is given below:

Global year As Integer
Sub Calculate()
    Dim start_year As Integer
    Dim end_year As Integer
    Dim cell_count As Integer
    start_year = Cells(19, "M").Value
    end_year = Cells(48, "M").Value
    year = start_year
    cell_count = 19
    Do While year < (end_year + 1)
        Dim row As Integer
        Dim sum As Double
        Dim count As Integer
        Dim init_row As Integer
        init_row = 6
        sum = 0
        count = 0
        Dim cv As Integer
        cv = 3
        Do Until cv = year
            cv = Cells(init_row, "C").Value
            init_row = init_row + 1
        Loop
        row = init_row - 1
        Worksheets("Sheet1").Activate
        Dim cv1 As Integer
        cv1 = Cells(row, "C").Value
        Do While cv1 = year
            sum = sum + Cells(row, "F").Value
            count = count + 1
            row = row + 1
            cv1 = Cells(row, "C").Value
        Loop
        Cells(cell_count, "N").Value = sum
        Cells(cell_count, "O").Value = count
        Cells(cell_count, "P").Value = sum / count
        cell_count = cell_count + 1
        year = year + 1
    Loop
End Sub

It doesn't really matter to me whether I use R, Excel function, Macros or any other technique. It'd be great if someone can tell me how to convert this data set of about 50000 values for RH and wind_mps each to hourly average.

Thanks in advance.

Community
  • 1
  • 1
small_world
  • 139
  • 5
  • 13
  • This is easy with xts/zoo, and I assure you they work with R-3.0.2. – Joshua Ulrich Mar 27 '14 at 17:11
  • So you want, next to each value it's hourly average (in other words, it would repeat 6 times - once for each 10 minutes)? – John Bustos Mar 27 '14 at 19:36
  • @JoshuaUlrich I tried xts/zoo. Is there a way to convert a dataframe to a matrix supported by xts? I tried this: matrix_rhwind<-as.xts(rh_wind2,dateFormat='POSIXct') But get:'Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format' Do you know what's happening? – small_world Mar 27 '14 at 20:46
  • @JohnBustos: I want the 10 min values to be aggregated into hourly averages. So there would be one value every hour, that is an average of the six ten-minute values. – small_world Mar 27 '14 at 20:48
  • That being the case, look at the solution I posted - It should give you exactly what you're looking for. – John Bustos Mar 27 '14 at 20:51
  • 1
    `as.xts.data.frame` assumes the date-times are the data.frame rownames. If they're not the rownames, then you need to call the `xts` constructor explicitly: `xts(rh_wind2[,c("RH","wind_mps")], rh_wind2$datetime)`. Then you can call `period.apply(x, endpoints(x, "hours"), mean)`. – Joshua Ulrich Mar 27 '14 at 21:10
  • @JoshuaUlrich: I'm assuming 'x' there means the name of my dataframe(rh_wind2). When I try that, the explicit call goes fine, but I get 'Error in try.xts(x, error = "must be either xts-coercible or timeBased") : must be either xts-coercible or timeBased' when I do 'period.apply(rh_wind2, endpoints(rh_wind2, "hours"), mean)'. Did I do anything wrong? Thanks for the reply. – small_world Mar 27 '14 at 21:33
  • The xts constructor does not modify your original data.frame, `rh_wind2`. I assumed you would assign it to a new object, named `x`. – Joshua Ulrich Mar 27 '14 at 21:37
  • @JoshuaUlrich: I somehow end up with a bunch of 'NA' values when I do that, and the number of values does not add up to what it should be i.e I should have 8761 hours from Oct 1 2008 to Sep 30 2009, but I end up with 8541 values only. I don't understand why I'm losing data in the process. – small_world Mar 27 '14 at 23:25
  • It would be a lot easier to help if you provided a minimal [reproducible example](http://stackoverflow.com/q/5963269/271616). If your data have `NA`, then you need to specify `na.rm=TRUE` in your call to `mean`. You won't have an observation for a given hour if there is not at least one observation in that given hour. But I'm just guessing about the causes because I don't know what your actual data are. – Joshua Ulrich Mar 28 '14 at 10:21

1 Answers1

1

This can be VERY easily done via Excel Pivot Tables...

  1. Select your data and, in the top menu ribbon, go to Insert > Pivot Table

In the Pivot Table designer:

  1. Select your datetime as your Row Labels
  2. Select RH & wind_mps as your Values
  3. For both your values, click on them and select Value Field Settings > Average

In the Pivot Table itself:

  1. Now, go to your Pivot Table itself and right click on any of the dates in the first column
  2. Select Group from the context menu that shows up and select Hours from the list that comes up

That should give you what you're looking to do with no programming at all

Hope this does what you wanted

John Bustos
  • 19,036
  • 17
  • 89
  • 151
  • Thank you. I tried this, but it seems to aggregate values in a strange manner. I wish I knew how to copy Excel data into this comment without losing formatting so that I can show you what's happening. – small_world Mar 27 '14 at 20:55
  • Describe it? It's probably an issue with the pivot table that can be fixed pretty easily... – John Bustos Mar 27 '14 at 21:06
  • Everything goes fine until I 'Group' by 'Hours'. I end up getting data aggregated with labels '12AM' '1AM' etc with about 88 values under each such label. The best result for me would be data aggregated by one row for each hour of the day. Is there a way to do this? – small_world Mar 27 '14 at 21:14
  • You can select more than one item to group by - Group by Hours AND Days, Months, etc... – John Bustos Mar 27 '14 at 21:20