1

I am trying to calculate geometric mean of a column every 10 min.

My sample data is..

TimeDate    diam    ratio
2016-05-11 8:25 134.491 1.83074
2016-05-11 8:25 117.777 1.34712
2016-05-11 8:25 104.27  0.927635
2016-05-11 8:25 204.085 1.43079
2016-05-11 8:25 96.8011 0.991716
2016-05-11 8:25 119.152 1.09884
2016-05-11 8:25 113.871 0.932493
2016-05-11 8:26 150.468 0.710525
2016-05-11 8:26 116.576 1.11207
2016-05-11 8:26 192.257 1.61558
2016-05-11 8:26 128.071 0.756608
2016-05-11 8:26 177.667 0.73309
2016-05-11 8:27 97.7377 0.862858
2016-05-11 8:27 98.3195 1.00681
2016-05-11 8:27 91.3603 0.95051
2016-05-11 8:27 152.95  0.842145
2016-05-11 8:27 133.125 1.28365
2016-05-11 8:27 95.2516 0.573588

I've tried it using dplyr function, but the code below does not result in values every 10 min, but one geometric mean value and one geometric sd value.

mydata$TimeDate <- as.POSIXct(strptime(mydata$TimeDate, format = "%Y-%m-%d %H:%M","GMT"))

mydata %>%
    group_by(by10 = cut(TimeDate, breaks="10 min")) %>%
   summarize(Geo.Mean=exp(mean(log(diam))),
          Geo.SD=exp(sd(log(diam)))) 

Data format itself is ok, as aggregate function as below makes nicely, although it does not create geometric mean.

aggregate(mydata["diam"], 
               list(TimeDate=cut(mydata$TimeDate, "10 mins")),
                median, na.rm=T)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
user2928318
  • 549
  • 3
  • 7
  • 20
  • All your data is within `10mins`. Hence everything will grouped in single group. – MKR Apr 11 '18 at 06:44

1 Answers1

2

One option is use lubridate::floor_date function to create group for every 10 mins round the clock. All data between 20-30 mins will be grouped as 20th mins and so on.

library(dplyr)
library(lubridate)

mydata %>% mutate(TimeDate = as.POSIXct(TimeDate, format = "%Y-%m-%d %H:%M")) %>%
  group_by(Diff_10 = floor_date(TimeDate, "10minute")) %>%
  summarise(Geo.Mean=exp(mean(log(diam))),
            Geo.SD=exp(sd(log(diam))))

# # A tibble: 1 x 3
#     Diff_10             Geo.Mean Geo.SD
#     <dttm>                 <dbl>  <dbl>
#   1 2016-05-11 08:20:00      125   1.28

#Result with modified data
# # A tibble: 6 x 3
#   Diff_10             Geo.Mean Geo.SD
#   <dttm>                 <dbl>  <dbl>
# 1 2016-05-11 08:20:00    118     1.14
# 2 2016-05-11 08:30:00    141     1.69
# 3 2016-05-11 08:40:00    127     1.16
# 4 2016-05-11 08:50:00    150     1.28
# 5 2016-05-11 09:10:00     98.0   1.00
# 6 2016-05-11 09:20:00    115     1.29

cut can be used if groups data to be grouped every 10 mins from starting time. In OP, groups will be as 2016-05-11 08:25, 2016-05-11 08:35 and so on.

Modified OP's data:

mydata <- read.table(text = 
"TimeDate    diam    ratio
'2016-05-11 8:25' 134.491 1.83074
'2016-05-11 8:25' 117.777 1.34712
'2016-05-11 8:25' 104.27  0.927635
'2016-05-11 8:35' 204.085 1.43079
'2016-05-11 8:35' 96.8011 0.991716
'2016-05-11 8:42' 119.152 1.09884
'2016-05-11 8:45' 113.871 0.932493
'2016-05-11 8:46' 150.468 0.710525
'2016-05-11 8:56' 116.576 1.11207
'2016-05-11 8:56' 192.257 1.61558
'2016-05-11 8:56' 128.071 0.756608
'2016-05-11 8:59' 177.667 0.73309
'2016-05-11 9:17' 97.7377 0.862858
'2016-05-11 9:17' 98.3195 1.00681
'2016-05-11 9:27' 91.3603 0.95051
'2016-05-11 9:27' 152.95  0.842145
'2016-05-11 9:27' 133.125 1.28365
'2016-05-11 9:27' 95.2516 0.573588",
header = TRUE, stringsAsFactors = FALSE)
MKR
  • 19,739
  • 4
  • 23
  • 33
  • Thanks.., but it gives me the same result. One Geo.Mean and one Geo.SD. The result is the same even when I averaged at 1 min interval. The data shown above is just a sample, and My data.frame actually has 57987 obs, in which TimeDate is POSIXct and diam is numeric. – user2928318 Apr 11 '18 at 07:15
  • @user2928318 I have modified your data to include time from different ranges. Also, I edited the modified result and you can notice there are `6` rows in result. May be you can first try it on modified data and get it working then drill down why its not working on your actual data. – MKR Apr 11 '18 at 07:34
  • @user2928318 Since in your actual data frame `TimeDate` is in `POSIXct` format hence you don't need `mutate` part from my solution. I'm sure you are aware about it still thought to point you. – MKR Apr 11 '18 at 07:37
  • Great! It works nicely, after closing and restart R. I realize such issue happens sometimes. Many thanks! :) – user2928318 Apr 11 '18 at 08:06