1

I'm surprised I couldn't find a question on this website that would answer mine.

I want to create 24 dummy variables for each hour of the day (the value is 1 if the time is that hour of the day and 0 otherwise). A (really) small part of the data looks like this:

       df <- as.POSIXct(c("08-01-2018 19:46", "08-01-2018 19:50", "08-01- 
       2018 20:46", "09-01-2018 21:17"), format = "%d-%m-%Y %H:%M")

       [1] "2018-01-08 19:46:00 CET" "2018-01-08 19:50:00 CET" "2018-01-08 
       20:46:00 CET" "2018-01-09 21:17:00 CET"

I want the output to be like this:

           19 20 21
        1:  1  0  0
        2:  1  0  0
        3:  0  1  0
        4:  0  0  1

I have already looked at this question: Creating a dummy variable for certain hours of the day

The only problem I have with the answer for my problem is that I have to write 24 ifelse statements for each case.

I was wondering if there is a more elegant way to get this output without having to write 24 ifelse statements.

If this question is a duplicate, please let me know!

Thanks in advance,

RC

r c
  • 49
  • 6

4 Answers4

3

Is this ok? You can use as.data.frame on the output if you need it as a data.frame

library(lubridate)
hours <- as.factor(lubridate::hour(df))

# with intercept
model.matrix(~hours)

# without intercept - (+0)
model.matrix(~hours+0)

further reading:

Generate a dummy-variable

https://stats.stackexchange.com/questions/174976/why-does-the-intercept-column-in-model-matrix-replace-the-first-factor

Jonny Phelps
  • 2,687
  • 1
  • 11
  • 20
  • 2
    If it is possible that all hours are not represented, but you want to guarantee all 24 dummy variables then add `, levels=0:23` to the call to `factor`. – Greg Snow Oct 02 '19 at 15:50
  • Thanks @Jonny Phelps, it works like a charm. Also thanks to Greg Snow for the great addition! – r c Oct 03 '19 at 08:17
2

using base R you could do:

model.matrix(~a-1,data.frame(a=factor(as.POSIXlt(df)$h)))

 a19 a20 a21
1   1   0   0
2   1   0   0
3   0   1   0
4   0   0   1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
Onyambu
  • 67,392
  • 3
  • 24
  • 53
1

Using tidyverse (Edit with NA suppression) :

df <- tibble::tibble(time = as.POSIXct(c("08-01-2018 19:46", "08-01-2018 19:50", "08-01-2018 20:46", "09-01-2018 21:17"), format = "%d-%m-%Y %H:%M")
)

suppressPackageStartupMessages(library(dplyr))
df_dummy <- df %>% 
        mutate(
                hours = lubridate::hour(time),
                dummy = 1)

tidyr::pivot_wider(data = df_dummy, names_from = hours, values_from = dummy, values_fill = list(dummy = 0))
#> # A tibble: 4 x 4
#>   time                 `19`  `20`  `21`
#>   <dttm>              <dbl> <dbl> <dbl>
#> 1 2018-01-08 19:46:00     1     0     0
#> 2 2018-01-08 19:50:00     1     0     0
#> 3 2018-01-08 20:46:00     0     1     0
#> 4 2018-01-09 21:17:00     0     0     1
cbo
  • 1,664
  • 1
  • 12
  • 27
1

This problem can be solved using the package lubridate.

Solution using a for loop

hour() gives us the hour of a POSIXct object. By creating a vector of the hours of interest and letting them run over the points in time you supplied, one can do the following:

# hours, storage vector and list for building the dataframe
hourv <- c(19:21)
storage <- c()
list <- list()
# the loop over the desired hours and points in time 
for(k in 1:4){
for(i in 1:3){
  if(hourv[i] == hour(df[k])){
    storage[i] <- 1
  }
  else{
    storage[i] <- 0
  }
}
list[[k]] <- storage
}

Result

df1 <- as.data.frame(do.call(rbind,list))

  V1 V2 V3
1  1  0  0
2  1  0  0
3  0  1  0
4  0  0  1

Data

df <- as.POSIXct(c("08-01-2018 19:46", "08-01-2018 19:50", "08-01-2018 20:46", "09-01-2018 21:17"), format = "%d-%m-%Y %H:%M")

fabla
  • 1,806
  • 1
  • 8
  • 20