0

I have to create a database with a single row for every day in the interval between the two dates (date_in - date_out). I have to use R.

How can I do this?

My data:

  id           date_in          date_out days
1  1 13May2022 0:00:00 03Jul2022 0:00:00   51
2  3 10Nov2020 0:00:00 15Nov2020 0:00:00    5
3  4 25Feb2020 0:00:00 05Apr2020 0:00:00   40
> dput(df)
structure(list(id = c(1L, 3L, 4L), date_in = c("13May2022 0:00:00", 
"10Nov2020 0:00:00", "25Feb2020 0:00:00"), date_out = c("03Jul2022 0:00:00", 
"15Nov2020 0:00:00", "05Apr2020 0:00:00"), days = c(51, 5, 40
)), class = "data.frame", row.names = c(NA, -3L))
ArTu
  • 431
  • 4
  • 20
  • 1
    Please show your data using `dput()`. – Martin Gal Nov 19 '22 at 18:42
  • 1
    Does this answer your question? [Insert rows for missing dates/times](https://stackoverflow.com/questions/16787038/insert-rows-for-missing-dates-times) – Mako212 Nov 19 '22 at 18:48
  • 1
    I agree that this is a dupe in concept, though to be fair the dupe link deals with a single column of dates (where gaps between rows is undesired) and this just wants to expand between the explicit columns of the timestamp ranges. With the `dplyr` package loaded, try `quux %>% mutate(across(c(date_in, date_out), ~ as.POSIXct(., format = "%d%b%Y %H:%M:%S"))) %>% group_by(id) %>% summarize(date = do.call(c, Map(seq, date_in, date_out, list(by = "days"))), .groups = "drop")`. If there is more that you need, ArTu, please provide cleaner data (`dput(.)`) and expected output. Thanks! – r2evans Nov 19 '22 at 19:01
  • Thank you. Hope now it's ok. I've tried your code @r2evans but it doesn'y work. I'm not so expert. I obtain this error: Error in app$vspace(new_style$`margin-top` %||% 0) : attempt to apply non-function – ArTu Nov 20 '22 at 23:32

2 Answers2

0

Here is an option. First, change dates into dates (yours might already be), then we map out all the dates from the start to the end, lastly we unnest.

library(tidyverse)

#data
df <- read.csv(textConnection("id,           date_in,          date_out, days,
1, 13May2022 0:00:00, 03Jul2022 0:00:00,   51,
3, 10Nov2020 0:00:00, 15Nov2020 0:00:00,    5,
4, 25Feb2020 0:00:00, 05Apr2020 0:00:00,   40"))  |>
  select(-X) 

#solution
df|>
  mutate(across(starts_with("date"), \(x) lubridate::dmy_hms(x) |> 
                  lubridate::date()),
         full_date = map2(date_in, date_out, \(x,y) seq(x, y, by = "1 day"))) |>
  unnest_longer(full_date) |>
  select(id, date = full_date)
#> # A tibble: 99 x 2
#>       id date      
#>    <int> <date>    
#>  1     1 2022-05-13
#>  2     1 2022-05-14
#>  3     1 2022-05-15
#>  4     1 2022-05-16
#>  5     1 2022-05-17
#>  6     1 2022-05-18
#>  7     1 2022-05-19
#>  8     1 2022-05-20
#>  9     1 2022-05-21
#> 10     1 2022-05-22
#> # ... with 89 more rows
AndS.
  • 7,748
  • 2
  • 12
  • 17
  • Thank you! I've tried but I obtain these errors: Error: unexpected '>' in "df|>" Error: unexpected input in " mutate(across(starts_with("date"), \" Errore: unexpected ')' in " lubridate::date())". etc.How can I fix it? – ArTu Nov 20 '22 at 23:28
  • you need to update your version of R or change `|>` to `%>%` – AndS. Nov 20 '22 at 23:55
  • Now it works!!! Brilliant!! Thank you!! It's a quite complex code.. how can I understand more about it? Where can I study? – ArTu Nov 21 '22 at 00:32
  • try https://www.tidyverse.org/ – AndS. Nov 21 '22 at 15:47
0

Here is a similar approach to AndS.'s, but using summarize:

library(tidyverse)
library(lubridate)

# data
df <- read.csv(textConnection("id,           date_in,          date_out, days,
1, 13May2022 0:00:00, 03Jul2022 0:00:00,   51,
3, 10Nov2020 0:00:00, 15Nov2020 0:00:00,    5,
4, 25Feb2020 0:00:00, 05Apr2020 0:00:00,   40"))  |>
    select(-X) 

# answer
df |> 
    mutate(across(c(date_in, date_out), ~date(dmy_hms(.x)))) |> 
    group_by(id) |> 
    summarize(date=seq(date_in, date_out, by="1 day"))
brendaisy
  • 71
  • 5
  • Thank you! I've tried but I obtain these errors: Error: unexpected '>' in "df |>" and Errore: unexpected '>' in " group_by(id) |>" and Error in seq(date_in, date_out, by = "1 day") : object "date_in" not found. How can I fix it? – ArTu Nov 20 '22 at 19:12