0

I want to automatically create multiple Dataframes based on an interval of Dates of another Dataframe. Let's say I have this example:

df <- data.frame(Date = as.Date(c("2022-01-01", "2022-01-01", 
                                  "2022-01-02", "2022-01-02", "2022-01-02", 
                                  "2022-01-03", 
                                  "2022-01-04", "2022-01-04", 
                                  "2022-01-05", "2022-01-05", "2022-01-05")),
                 Name = c(LETTERS[1:11]),
                 Value = c(1:11))

My goal is to create 3 new Dataframes. df1 should contain the data from 2022-01-01 to 2022-01-04, df2 should contain the data from 2022-01-02 to 2022-01-05, and df3 should contain the data from 2022-01-03 to 2022-01-06. With that, this is the desired Output, with all the objects being as dataframes:

df1 <- data.frame(Date = as.Date(c("2022-01-01", "2022-01-01", 
                                  "2022-01-02", "2022-01-02", "2022-01-02", 
                                  "2022-01-03")),
                 Name = c(LETTERS[1:6]),
                 Value = c(1:6))

df2 <- data.frame(Date = as.Date(c("2022-01-02", "2022-01-02", "2022-01-02", 
                                   "2022-01-03", 
                                   "2022-01-04", "2022-01-04")),
                  Name = c(LETTERS[3:8]),
                  Value = c(3:8))

df3 <- data.frame(Date = as.Date(c("2022-01-03", 
                                   "2022-01-04", "2022-01-04", 
                                   "2022-01-05", "2022-01-05", "2022-01-05")),
                  Name = c(LETTERS[6:11]),
                  Value = c(6:11))

Notice that the number of observations from each date is different. My actual Dataframe is much bigger than the example and it will keep increasing each day, so I need to make this process automatic. Any sugestions?

  • 1
    Do you need `library(slider);library(dplyr);slide(unique(df$Date), .f = \(x) filter(df, Date %in% seq(x, length.out = 4, by = '1 day')) %>% filter(n_distinct(Date) >=3)) %>% keep(~ nrow(.) > 0) %>% setNames(str_c("df", seq_along(.))) %>% list2env(.GlobalEnv)` – akrun Aug 29 '22 at 19:44
  • Didn't really understand your answer. Do you mind elaborating a bit more on the process and structure of the code? – Artur Vidaurre de Almeida Aug 29 '22 at 20:22
  • The code loops over the `unique` 'Date' in your data in `slide`, then get the sequence of data from that unique Date where we specify the length as 4, to subset/filter the data, and then we filter datasets where number of distinct elements in Date is greater than 2, `keep` only those list elements where thenumber of rows is greater than 0, and then create objects df1, df2, df3, in the global env with `list2env` after setting the names – akrun Aug 29 '22 at 20:27
  • Thanks a lot! I guess I should look into `slide` to fully understand this process. Anyway, the code works perfectly! Again, thank you!!! – Artur Vidaurre de Almeida Aug 29 '22 at 21:07

1 Answers1

1

Here's an alternative:

dates <- seq(df$Date[1], df$Date[1]+3, by = "day")
dates
# [1] "2022-01-01" "2022-01-02" "2022-01-03" "2022-01-04"
Map(function(a, b) dplyr::filter(df, between(Date, a, b)), dates, dates + 3)
# [[1]]
#         Date Name Value
# 1 2022-01-01    A     1
# 2 2022-01-01    B     2
# 3 2022-01-02    C     3
# 4 2022-01-02    D     4
# 5 2022-01-02    E     5
# 6 2022-01-03    F     6
# 7 2022-01-04    G     7
# 8 2022-01-04    H     8
# [[2]]
#         Date Name Value
# 1 2022-01-02    C     3
# 2 2022-01-02    D     4
# 3 2022-01-02    E     5
# 4 2022-01-03    F     6
# 5 2022-01-04    G     7
# 6 2022-01-04    H     8
# 7 2022-01-05    I     9
# 8 2022-01-05    J    10
# 9 2022-01-05    K    11
# [[3]]
#         Date Name Value
# 1 2022-01-03    F     6
# 2 2022-01-04    G     7
# 3 2022-01-04    H     8
# 4 2022-01-05    I     9
# 5 2022-01-05    J    10
# 6 2022-01-05    K    11
# [[4]]
#         Date Name Value
# 1 2022-01-04    G     7
# 2 2022-01-04    H     8
# 3 2022-01-05    I     9
# 4 2022-01-05    J    10
# 5 2022-01-05    K    11

Granted, this made four instead of three, but that can easily be controlled by the assignment to dates.

This produces a list of frames, not three independent frames. I think you'll find that when you have multiple identically-structured (column names/intents) frames, it's best to keep them in a list, that way when you intend to do something to each of them, you can easily use lapply. See https://stackoverflow.com/a/24376207/3358227 for more discussion on this.

r2evans
  • 141,215
  • 6
  • 77
  • 149