0

I want to split a time series data set (only including nighttime data already!) in separate nights to apply a missing value imputation method for every night separately. That's why I need to create a new variable "night", labelling every night separately.

Any ideas how to correctly create the variable "night" by applying the dplyr::if_else()- function (e.g., by using the "day" or "time" variable in the if conditions)?

This is the SAMPLE DATA:

# Sample Data
timestamp <- c("2020-05-26 04:15:33","2020-05-26 06:15:33","2020-05-26 22:15:33", "2020-05-26 23:15:33", "2020-05-27 00:15:33", "2020-05-27 04:15:33", "2020-05-27 22:15:33","2020-05-28 00:15:33", "2020-05-28 04:15:33", "2020-05-28 22:15:33", "2020-05-29 00:15:33")
time <- c("04:15:33","06:15:33","22:15:33", "23:15:33", "00:15:33", "04:15:33", "22:15:33","00:15:33", "04:15:33", "22:15:33", "00:15:33")
day <- c(1,1,1,1,2,2,2,3,3,3,4)
df <- as.data.frame(cbind(timestamp, time, day))
 
df
#              timestamp     time day  
# 1  2020-05-26 04:15:33 04:15:33   1 
# 2  2020-05-26 06:15:33 06:15:33   1 
# 3  2020-05-26 22:15:33 22:15:33   1 
# 4  2020-05-26 23:15:33 23:15:33   1 
# 5  2020-05-27 00:15:33 00:15:33   2 
# 6  2020-05-27 04:15:33 04:15:33   2 
# 7  2020-05-27 22:15:33 22:15:33   2 
# 8  2020-05-28 00:15:33 00:15:33   3 
# 9  2020-05-28 04:15:33 04:15:33   3 
# 10 2020-05-28 22:15:33 22:15:33   3 
# 11 2020-05-29 00:15:33 00:15:33   4 

This would be the CORRECT RESULT:

# Sample Data - CORRECT RESULT
   
    df_result
    #              timestamp     time day  night
    # 1  2020-05-26 04:15:33 04:15:33   1 night0
    # 2  2020-05-26 06:15:33 06:15:33   1 night0
    # 3  2020-05-26 22:15:33 22:15:33   1 night1
    # 4  2020-05-26 23:15:33 23:15:33   1 night1
    # 5  2020-05-27 00:15:33 00:15:33   2 night1
    # 6  2020-05-27 04:15:33 04:15:33   2 night1
    # 7  2020-05-27 22:15:33 22:15:33   2 night2
    # 8  2020-05-28 00:15:33 00:15:33   3 night2
    # 9  2020-05-28 04:15:33 04:15:33   3 night2
    # 10 2020-05-28 22:15:33 22:15:33   3 night3
    # 11 2020-05-29 00:15:33 00:15:33   4 night3
Ana
  • 115
  • 6

2 Answers2

1

Since you've only included night time data, just use 12:00 as a cut off for a new night in your if-statement:

df %>% 
  mutate(night = paste0("night", as.numeric(ifelse(time <= "12:00:00", -1, 0)) + as.numeric(day)))


             timestamp     time day  night
1  2020-05-26 04:15:33 04:15:33   1 night0
2  2020-05-26 06:15:33 06:15:33   1 night0
3  2020-05-26 22:15:33 22:15:33   1 night1
4  2020-05-26 23:15:33 23:15:33   1 night1
5  2020-05-27 00:15:33 00:15:33   2 night1
6  2020-05-27 04:15:33 04:15:33   2 night1
7  2020-05-27 22:15:33 22:15:33   2 night2
8  2020-05-28 00:15:33 00:15:33   3 night2
9  2020-05-28 04:15:33 04:15:33   3 night2
10 2020-05-28 22:15:33 22:15:33   3 night3
11 2020-05-29 00:15:33 00:15:33   4 night3

LRRR
  • 456
  • 3
  • 8
  • This results in "nightNA" in the night column and the following warning: 1: Problem with `mutate()` input `night`. ℹ ‘<=’ not meaningful for factors ℹ Input `night` is `paste0(...)`. 2: In Ops.factor(time, "12:00:00") : ‘<=’ not meaningful for factors – Ana Sep 22 '20 at 13:09
  • That error seems to suggest one of your columns is formatted as a factor, unlike the example in the post above. If its time, changing the `ifelse(time <= ...` to `ifelse(as.character(time) <= ...` will work. A factor will not work with ifelse – LRRR Sep 22 '20 at 13:38
  • Works now, thanks! (You were right, somehow "time" variable class changed to factor when creating the sample data again...) – Ana Sep 22 '20 at 14:03
0

I would calculate the difference from starting date 2020-05-26 to the current date first. How that's done can be found here: calculating number of days between 2 columns of dates in data frame

Then create a number column with a IF statement: IF time < (earlier) 12:00 (midday) THAN +0, IF time > (later) 12:00 THAN +1

Let me know if you have difficulties implementing this!

Mart Vos
  • 114
  • 8