2

I am trying to calculate the daily mode of this time series. In the example data below, I would like to see the mode of the windDir.c column per day.

Don't know how to use the apply.daily() wrapper given the fact there is no "colMode" argument. So, I tried using a custom function in period.apply(), but to no avail. The code I tried, along with the dput follows.

ep <- endpoints(wind.d,'days') 
modefunc <- function(x) {
  tabresult <- tabulate(x)
  themode <- which(tabresult == max(tabresult))
  if (sum(tabresult == max(tabresult))>1)
    themode <- NA
  return(themode)
}

period.apply(wind.d$windDir.c, INDEX=ep, FUN=function(x) mode(x))

Reproducible data:

wind.d <- structure(list(date = structure(c(1280635200, 1280635200, 1280635200, 
1280635200, 1280635200, 1280635200, 1280635200, 1280721600, 1280721600, 
1280721600, 1280721600, 1280721600, 1280721600, 1280721600, 1280808000, 
1280808000, 1280808000, 1280808000, 1280808000, 1280808000), class = c("POSIXct", 
"POSIXt"), tzone = ""), windDir.c = structure(c(4L, 3L, 3L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 6L, 5L, 5L, 4L, 5L, 5L
), .Label = c("15", "45", "75", "105", "135", "165", "195", "225", 
"255", "285", "315", "345"), class = "factor")), .Names = c("date", 
"windDir.c"), class = "data.frame", row.names = c(NA, -20L))
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
SilvanD
  • 325
  • 2
  • 14

4 Answers4

1

We can easily do this using dplyr:

library(dplyr)
wind.d %>% group_by(date, windDir.c) %>%
           summarise(count = n()) %>%
           summarise(mode = windDir.c[which.max(count)])
jeremycg
  • 24,657
  • 5
  • 63
  • 74
1

Or base R:

 calMode <- function(x) {
   ux <- unique(x)
   return(ux[which.max(tabulate(match(x, ux)))])
 }
 myModes <- tapply(as.character(windDir.c), INDEX = date, FUN = calMode)
Michael Kaiser
  • 133
  • 1
  • 9
  • Thanks, Michael, for the input. When I ran your code, however, I got an error saying "Error in unique.default(x, nmax = nmax) : unique() applies only to vectors" – SilvanD Aug 20 '15 at 01:10
1

Note that the code you tried and the output of dput you provide aren't consistent. The dput output isn't an xts object, and the code you supplied will only work with an xts object (endpoints fails on the data.frame you provided).

Assuming that wind.d is indeed an xts object, you can do this easily with xts:

wind.d <- structure(c(105, 75, 75, 105, 105, 105, 105, 105, 105, 105, 105, 
  105, 135, 135, 165, 135, 135, 105, 135, 135), .Dim = c(20L, 1L),
  index = structure(c(1280635200, 1280635200, 1280635200, 1280635200, 
  1280635200, 1280635200, 1280635200, 1280721600, 1280721600, 1280721600, 
  1280721600, 1280721600, 1280721600, 1280721600, 1280808000, 1280808000, 
  1280808000, 1280808000, 1280808000, 1280808000), tzone = "",
  tclass = c("POSIXct", "POSIXt")), .indexCLASS = c("POSIXct", "POSIXt"),
  tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "",
  .Dimnames = list(NULL, "windDir.c"), class = c("xts", "zoo"))
apply.daily(x, function(x) which.max(tabulate(x)))
#                     windDir.c
# 2010-07-31 23:00:00       105
# 2010-08-01 23:00:00       105
# 2010-08-02 23:00:00       135
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • Thanks for finding an easy solution using the function I originally wanted to use. Apologies for the inconsistency -- am just now learning how to meddle with xts time series objects. – SilvanD Aug 20 '15 at 12:25
1

We can load the package modeest to use the function mfv (Most Frequent Value)

library(dplyr)
library(modeest)
wind.d %>% group_by(date) %>% summarise(mode = mfv(windDir.c)) 

Output:

                 date mode
1 2010-08-01 06:00:00  105
2 2010-08-02 06:00:00  105
3 2010-08-03 06:00:00  135

If there are multiple modes we need to specify the element we would like to retrieve. Otherwise it will return an error. For instance, the first element:

mfv(iris[iris$Species=="setosa", 1])
[1] 5.0 5.1
# dplyr
iris %>% group_by(Species) %>% summarise(mode = mfv(Sepal.Length)[1]) 
     Species mode
1     setosa  5.0
2 versicolor  5.5
3  virginica  6.3

sqldf

For those interested in sqldf, using this approach:

library(sqldf)
sqldf("SELECT date, 
            (SELECT [windDir.c]
            FROM [wind.d] 
            WHERE date = tbl.date
            GROUP BY [windDir.c] 
            ORDER BY count(*) DESC
            LIMIT 1) AS mode
      FROM (SELECT DISTINCT date
            FROM [wind.d]) AS tbl")
Community
  • 1
  • 1
mpalanco
  • 12,960
  • 2
  • 59
  • 67