Calculate daily mode of time series in R

Question

I am trying to calculate the daily mode of this time series. In the example data below, I would like to see the mode of the windDir.c column per day.

Don't know how to use the apply.daily() wrapper given the fact there is no "colMode" argument. So, I tried using a custom function in period.apply(), but to no avail. The code I tried, along with the dput follows.

ep <- endpoints(wind.d,'days') 
modefunc <- function(x) {
  tabresult <- tabulate(x)
  themode <- which(tabresult == max(tabresult))
  if (sum(tabresult == max(tabresult))>1)
    themode <- NA
  return(themode)
}

period.apply(wind.d$windDir.c, INDEX=ep, FUN=function(x) mode(x))

Reproducible data:

wind.d <- structure(list(date = structure(c(1280635200, 1280635200, 1280635200, 
1280635200, 1280635200, 1280635200, 1280635200, 1280721600, 1280721600, 
1280721600, 1280721600, 1280721600, 1280721600, 1280721600, 1280808000, 
1280808000, 1280808000, 1280808000, 1280808000, 1280808000), class = c("POSIXct", 
"POSIXt"), tzone = ""), windDir.c = structure(c(4L, 3L, 3L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 6L, 5L, 5L, 4L, 5L, 5L
), .Label = c("15", "45", "75", "105", "135", "165", "195", "225", 
"255", "285", "315", "345"), class = "factor")), .Names = c("date", 
"windDir.c"), class = "data.frame", row.names = c(NA, -20L))

score 1 · Accepted Answer · answered Aug 19 '15 at 22:04

1

We can easily do this using dplyr:

library(dplyr)
wind.d %>% group_by(date, windDir.c) %>%
           summarise(count = n()) %>%
           summarise(mode = windDir.c[which.max(count)])

answered Aug 19 '15 at 22:04

jeremycg

24,657
5
63
74

score 1 · Answer 2 · answered Aug 19 '15 at 22:15

1

Or base R:

 calMode <- function(x) {
   ux <- unique(x)
   return(ux[which.max(tabulate(match(x, ux)))])
 }
 myModes <- tapply(as.character(windDir.c), INDEX = date, FUN = calMode)

answered Aug 19 '15 at 22:15

Michael Kaiser

133
1
9

Thanks, Michael, for the input. When I ran your code, however, I got an error saying "Error in unique.default(x, nmax = nmax) : unique() applies only to vectors" – SilvanD Aug 20 '15 at 01:10

score 1 · Answer 3 · answered Aug 20 '15 at 05:17

Note that the code you tried and the output of dput you provide aren't consistent. The dput output isn't an xts object, and the code you supplied will only work with an xts object (endpoints fails on the data.frame you provided).

Assuming that wind.d is indeed an xts object, you can do this easily with xts:

wind.d <- structure(c(105, 75, 75, 105, 105, 105, 105, 105, 105, 105, 105, 
  105, 135, 135, 165, 135, 135, 105, 135, 135), .Dim = c(20L, 1L),
  index = structure(c(1280635200, 1280635200, 1280635200, 1280635200, 
  1280635200, 1280635200, 1280635200, 1280721600, 1280721600, 1280721600, 
  1280721600, 1280721600, 1280721600, 1280721600, 1280808000, 1280808000, 
  1280808000, 1280808000, 1280808000, 1280808000), tzone = "",
  tclass = c("POSIXct", "POSIXt")), .indexCLASS = c("POSIXct", "POSIXt"),
  tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "",
  .Dimnames = list(NULL, "windDir.c"), class = c("xts", "zoo"))
apply.daily(x, function(x) which.max(tabulate(x)))
#                     windDir.c
# 2010-07-31 23:00:00       105
# 2010-08-01 23:00:00       105
# 2010-08-02 23:00:00       135

Thanks for finding an easy solution using the function I originally wanted to use. Apologies for the inconsistency -- am just now learning how to meddle with xts time series objects. — SilvanD, Aug 20 '15 at 12:25

score 1 · Answer 4 · edited May 23 '17 at 12:07

We can load the package modeest to use the function mfv (Most Frequent Value)

library(dplyr)
library(modeest)
wind.d %>% group_by(date) %>% summarise(mode = mfv(windDir.c))

Output:

                 date mode
1 2010-08-01 06:00:00  105
2 2010-08-02 06:00:00  105
3 2010-08-03 06:00:00  135

If there are multiple modes we need to specify the element we would like to retrieve. Otherwise it will return an error. For instance, the first element:

mfv(iris[iris$Species=="setosa", 1])
[1] 5.0 5.1
# dplyr
iris %>% group_by(Species) %>% summarise(mode = mfv(Sepal.Length)[1]) 
     Species mode
1     setosa  5.0
2 versicolor  5.5
3  virginica  6.3

sqldf

For those interested in sqldf, using this approach:

library(sqldf)
sqldf("SELECT date, 
            (SELECT [windDir.c]
            FROM [wind.d] 
            WHERE date = tbl.date
            GROUP BY [windDir.c] 
            ORDER BY count(*) DESC
            LIMIT 1) AS mode
      FROM (SELECT DISTINCT date
            FROM [wind.d]) AS tbl")

Calculate daily mode of time series in R

4 Answers4

sqldf