0

I'm dealing with a fairly large dataset and trying to run a mixed-effects multilevel regression using the mixor function. I was advised to switch dates to a categorical variable, as the current format is taking quite some time to run. My data is stored within a tibble "arr_full", with dates as a column within this tibble stored as "rec_date" as a POSIXct data type.

I don't want to change the order of any of the rows and was thinking there might be some way to do this simply with lubridate. Basically I want any date that falls within periods of 15 days of each month stored as a categorical variable. So any date that falls within 15-30 Apr 2020 would be "1", 1-14 May 2020 as "2", and so on till 1-14 Apr 2021 as the last category "24".

I'm fairly new to R so any explanation of how to go about this and why would be extremely helpful. Thanks in advance for the help!

**edit: I've added my code below with the way I tried to do this. It's admittedly a bit roundabout to achieve it, but it's the way I could think of to do this. I made a separate column from my ordered date column, rec_date, and made rec_date1 that I want to be replaced by the specified variables. I don't get any errors when I run the code, but then when I run arr_full$rec_date1, it still outputs the original dates, not the replaced variables.

arr_full$rec_date1 <- ifelse(lubridate::month(arr_full$rec_date1) ==4 & lubridate::day(arr_full$rec_date1) >=15 , 1,
                          ifelse(lubridate::month(arr_full$rec_date1) ==5 & lubridate::day(arr_full$rec_date1) <15 , 2,
                                 ifelse(lubridate::month(arr_full$rec_date1) ==5 & lubridate::day(arr_full$rec_date1) >=15 , 3,
                                        ifelse(lubridate::month(arr_full$rec_date1) ==6 & lubridate::day(arr_full$rec_date1) <15 , 4, 
                          ifelse(lubridate::month(arr_full$rec_date1) ==6 & lubridate::day(arr_full$rec_date1) >=15 , 5,
                                  ifelse(lubridate::month(arr_full$rec_date1) ==7 & lubridate::day(arr_full$rec_date1) <15 , 6,
                                        ifelse(lubridate::month(arr_full$rec_date1) ==7 & lubridate::day(arr_full$rec_date1) >=15 , 7,
                          ifelse(lubridate::month(arr_full$rec_date1) ==8 & lubridate::day(arr_full$rec_date1) <15 , 8,
                                  ifelse(lubridate::month(arr_full$rec_date1) ==8 & lubridate::day(arr_full$rec_date1) >=15 , 9,
                                        ifelse(lubridate::month(arr_full$rec_date1) ==9 & lubridate::day(arr_full$rec_date1) <15 , 10,
                          ifelse(lubridate::month(arr_full$rec_date1) ==9 & lubridate::day(arr_full$rec_date1) >=15 , 11,
                                  ifelse(lubridate::month(arr_full$rec_date1) ==10 & lubridate::day(arr_full$rec_date1) <15 , 12,
                                        ifelse(lubridate::month(arr_full$rec_date1) ==10 & lubridate::day(arr_full$rec_date1) >=15 , 13,
                          ifelse(lubridate::month(arr_full$rec_date1) ==11 & lubridate::day(arr_full$rec_date1) <15 , 14,
                                  ifelse(lubridate::month(arr_full$rec_date1) ==11 & lubridate::day(arr_full$rec_date1) >=15 , 15,
                                        ifelse(lubridate::month(arr_full$rec_date1) ==12 & lubridate::day(arr_full$rec_date1) <15 , 16,
                          ifelse(lubridate::month(arr_full$rec_date1) ==12 & lubridate::day(arr_full$rec_date1) >=15 , 17,
                                  ifelse(lubridate::month(arr_full$rec_date1) ==1 & lubridate::day(arr_full$rec_date1) <15 , 18,
                                        ifelse(lubridate::month(arr_full$rec_date1) ==1 & lubridate::day(arr_full$rec_date1) >=15 , 19,
                          ifelse(lubridate::month(arr_full$rec_date1) ==2 & lubridate::day(arr_full$rec_date1) <15 , 20,
                                  ifelse(lubridate::month(arr_full$rec_date1) ==2 & lubridate::day(arr_full$rec_date1) >=15 , 21,
                                        ifelse(lubridate::month(arr_full$rec_date1) ==3 & lubridate::day(arr_full$rec_date1) <15 , 22,
                          ifelse(lubridate::month(arr_full$rec_date1) ==3 & lubridate::day(arr_full$rec_date1) >=15 , 23,
                                  ifelse(month(arr_full$rec_date1) ==4 & day(arr_full$rec_date1) <15 , 24, NA)))))))))))))))))))))))

> arr_full$rec_date1
   [1] "2020-06-12 GMT" "2020-07-25 GMT" "2020-09-04 GMT" "2020-12-07 GMT" "2020-06-12 GMT" "2020-07-25 GMT"
   [7] "2020-09-07 GMT" "2020-12-07 GMT" "2021-03-11 GMT" "2020-06-12 GMT" "2020-07-25 GMT" "2020-09-08 GMT"
  [13] "2020-12-07 GMT" "2021-03-08 GMT" "2020-06-20 GMT" "2020-07-26 GMT" "2020-06-20 GMT" "2020-09-11 GMT"...
AnSri
  • 37
  • 3
  • I don't see how that would make things run any faster. If that's your only goal I don't think that will help. Who gave this advice? It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jul 27 '21 at 04:47
  • Hi Mr. Flick, thanks for the feedback. I'm currently working on a dissertation so my supervisor had suggested it based on her experience working on similar datasets. My current dataset has over 22,000 rows of data and from what was explained to me, R might be having difficulty parsing out that many dates. I'll include a subsequent comment with some sample code. – AnSri Jul 27 '21 at 12:22

1 Answers1

1

If your data is called df which has a Date column in it of class Date you can use cut with breaks as 15 days.

df$group <- cut(df$Date, '15 days',labels = FALSE)

Example -

x <- Sys.Date() + 1:365
y <- cut(x, '15 days',labels = FALSE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Hi Ronak, this seemed to do the trick. I was getting a bit confused as to where in the code to use this, but using it earlier seemed to work perfectly. Thank you so much! – AnSri Jul 27 '21 at 13:43