3

I have some data from different dates and want to know what the average (median or mean) hour that events occur. The problem is that normal averages don't work here as time is circular (e.g. 1 comes after 24). For example, the average of 11pm and 1am should be midnight, but the normal average function would give midday. However, I can't find any functions that are built to do this! Is there a way to do this in R?

Example data:

hours <- c(20, 21, 22, 23 , 0, 1, 2, 3, 4)

Expected result: mean = 0, median = 0

unknown
  • 853
  • 1
  • 10
  • 23
  • Meaby this will lead You to good way : https://stackoverflow.com/questions/42281134/average-time-in-a-column-in-hrminsec-format – Kuba Do Aug 19 '19 at 11:41
  • You can use modular arithmetic `sum(hours)%%24` gives 0 – maydin Aug 19 '19 at 11:42
  • @maydin would that also work for the median? – Dunois Aug 19 '19 at 11:43
  • @KubaDo I already had a look at those. They don't give the correct answer – unknown Aug 19 '19 at 11:44
  • You are possibly looking for: https://stackoverflow.com/questions/32404222/circular-mean-in-r – tmfmnk Aug 19 '19 at 11:47
  • I don't understand the idea behind the median. Why is it 0 ? – maydin Aug 19 '19 at 11:49
  • Is your hours vector sorted from start to end? It's a timeseries? So if we first observe 20 and than 4, we know there is 8 hours between those two? And not 16? – Arcoutte Aug 19 '19 at 11:56
  • median is 0 because the it is the middle value (i.e. 20-23 are before and 1-4 are after) – unknown Aug 19 '19 at 11:57
  • @Arcoutte the day is not important here as the information that I want know is the time of day that most events occur. Obviously this doesn't really hold up when you have two times (mean between midday and midnight could be either 6am or 6pm), but I have thousands of values. – unknown Aug 19 '19 at 11:59
  • 1
    If your data is in order **always** , you can just find the center point and pick as a median. If it is not, you need to explain the which day's hour is that. I.e. What if you have a data like `c(9,10,11,12,13,14,15,16,17,18,20, 21, 22, 23 , 0, 1, 2, 3, 4,5,6,7,8,9)` what do you expect the value of the median ? – maydin Aug 19 '19 at 12:00
  • @maydin yes I understand that it is not perfect to get averages on time. I guess in that case then there is no median. However, it is very unlikely that every hour comes up an equal number of times. – unknown Aug 19 '19 at 12:06

2 Answers2

5

1) nondecreasing Assuming the times are non-decreasing and that each time is less than 24 hours from the prior time we can determine the day of each time by adding 1 every time we encounter an hour that is less than the prior hour. Add 24 times the day to hour giving hours2 which is the total number of hours since hour 0. Finally take the mean or median modulo 24 to ensure it is in the interval [0, 24) .

hours <- c(20, 21, 22, 23 , 0, 1, 2, 3, 4)

day <- cumsum(c(0, diff(hours) < 0))
hours2 <- hours + 24 * day

mean(hours2) %% 24
## [1] 0

median(hours2) %% 24
## [1] 0

2) circular In this alternative we map the times to a circle and use mean.circular and median.circular from the circular package. More information on that package is available in its help files as well at Answering biological questions using circular data and analysis in R

library(circular)

hours <- c(20, 21, 22, 23 , 0, 1, 2, 3, 4)

hours.circ <- circular(hours, template = "clock24", units = "hours")

mean.circ <- mean(hours.circ)
as.numeric(mean.circ) %% 24
## [1] 0

median.circ <- median(hours.circ)
as.numeric(median.circ) %% 24
## [1] 0

plot(hours.circ)
points(mean.circ, col = "red", cex = 3)
points(median.circ, col = "blue", cex = 2)

[continued after graph]

screenshot

Note

You may also find it useful to try the above with a more asymmetric input.

hours <- c(20, 21, 22, 23 , 12)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

For the circular average, you do the following:

  1. Map the hours to a 24H circle by multiplying them by (2*pi/24).
  2. Calculate the mean x and y coordinates respectively.
  3. Transform these average circle coordinates back to hours.

I don't know if there exists a well-accepted definition of the circular median.

average_time <- function(x) {

  circle_hours <- x*(2*pi/24)

  mean_x <- mean(cos(circle_hours))
  mean_y <- mean(sin(circle_hours))

  atan2(mean_y, mean_x) / (2*pi) * 24
}

hours <- c(20, 21, 22, 23 , 0, 1, 2, 3, 4)
average_time(hours)
## [1] -1.078441e-15
Aron Strandberg
  • 3,040
  • 9
  • 15