Repeat rows based on time values split across multiple columns - R

Question

I am trying to repeat rows based on month and year values.

Currently, my df looks like this:

Country Date    Year   Month
Angola  1/2008  2008    1
Angola  6/2020  2020    6
Benin   1/2013  2013    1
Benin   6/2020  2020    6
Benin   7/2014  2014    7

For each country, I want to repeat the observations such that the df looks like this:

Country Year   Month
Angola  2008    1
Angola  2008    2
Angola  2008    3
Angola  2008    4
Angola  2008    5
Angola  2008    6

etc... all the way until 06/2020 for Angola

There is a really elegant solution to repeating rows based on values (from this post). If I were to repeat the rows only based on the years, the syntax from the solution would be like this:

df<-df %>%
  mutate(Year = readr::parse_number(Year)) %>% 
  group_by(Country)  %>%
  complete(Year =min(Year):max(Year))

However, I want to repeat the timeframe not just based on the years, but also the months. I haven't found a good way to adapt this syntax to do this. I tried to parse the Date variable as a date and then repeat based on that, but this would assign a date to the variable and repeat the rows far more times than I need.

df<-df %>% 
  mutate(Date = readr::parse_datetime(Date)) %>% 
  group_by(Country)  %>%
  complete(Date =min(Date):max(Date))

Any ideas about how to do this? Would prefer to adapt the syntax I've been trying, but open to new possibilities as well

akrun · Answer 1 · 2020-07-16T22:11:42.827

3

We remove the Date column, after grouping by 'Country', use complete with sequence of both 'Year' and 'Month'

library(dplyr)
out <- df1 %>% 
   select(-Date) %>% 
   mutate(Month2 = Month) %>% 
   group_by(Country) %>% 
   complete(Year = min(Year):max(Year), Month = first(Month):12) %>% 
   fill(Month2) %>%
   filter(Year == max(Year) & Month <= last(Month2)| Year != max(Year)) %>%
   select(-Month2)
out
# A tibble: 240 x 3
# Groups:   Country [2]
#   Country  Year Month
#   <chr>   <int> <int>
# 1 Angola   2008     1
# 2 Angola   2008     2
# 3 Angola   2008     3
# 4 Angola   2008     4
# 5 Angola   2008     5
# 6 Angola   2008     6
# 7 Angola   2008     7
# 8 Angola   2008     8
# 9 Angola   2008     9
#10 Angola   2008    10
# … with 231 more rows

-checking the output

-head

out %>%
   filter(Country == 'Angola') %>% 
   head(14)
# A tibble: 14 x 3
# Groups:   Country [1]
   Country  Year Month
   <chr>   <int> <int>
 1 Angola   2008     1
 2 Angola   2008     2
 3 Angola   2008     3
 4 Angola   2008     4
 5 Angola   2008     5
 6 Angola   2008     6
 7 Angola   2008     7
 8 Angola   2008     8
 9 Angola   2008     9
10 Angola   2008    10
11 Angola   2008    11
12 Angola   2008    12
13 Angola   2009     1
14 Angola   2009     2

-tail

out %>%
   filter(Country == 'Angola') %>% 
   tail(10)
# A tibble: 10 x 3
# Groups:   Country [1]
   Country  Year Month
   <chr>   <int> <int>
 1 Angola   2019     9
 2 Angola   2019    10
 3 Angola   2019    11
 4 Angola   2019    12
 5 Angola   2020     1
 6 Angola   2020     2
 7 Angola   2020     3
 8 Angola   2020     4
 9 Angola   2020     5
10 Angola   2020     6

data

df1 <- structure(list(Country = c("Angola", "Angola", "Benin", "Benin", 
"Benin"), Date = c("1/2008", "6/2020", "1/2013", "6/2020", "7/2014"
), Year = c(2008L, 2020L, 2013L, 2020L, 2014L), Month = c(1L, 
6L, 1L, 6L, 7L)), class = "data.frame", row.names = c(NA, -5L
))

edited Jul 16 '20 at 22:11

answered Jul 16 '20 at 21:01

akrun

874,273
37
540
662

My original savior back again! The problem I ran into is that this code ends at the month of the last year, instead of continuing until 12 for the year. I'll post a better explanation in the original post – Yu Na Jul 16 '20 at 21:08
@YuNa. Do you need `Month = min(Month):12`. Your expected output showed only 6 rows, so I was not sure – akrun Jul 16 '20 at 21:09
Ah yes, apologies for the confusion. When I substitute ```Month = min(Month):12``` for ```Month = min(Month):max(Month)```, the issue then is that the months continue past the last month for the last year. ie, for Angola, the months continue beyond 6/2020. Maybe I am not understanding your suggestion correctly? – Yu Na Jul 16 '20 at 21:24
@YuNa. Can you check my updated output. I am guessing it should work now – akrun Jul 16 '20 at 21:35
1

This works like a charm! Sorry for the delay, I'm trying to understand the lines using ```fill``` and ```filter``` – Yu Na Jul 16 '20 at 21:39
1

@YuNa You don't need the `fill` though, only use `max(Year, na.rm = TRUE)`. I just used `fill` to replace the `NA` elements with previous non-NA – akrun Jul 16 '20 at 21:40
Sorry for the delay, but after understanding the ```filter``` line, it seemed to me that there might be an issue with timeframes that have a beginning month with a value greater than the end month value. So for example, a country with a ```Date``` range of ```9/2019``` to ```6/2020``` has the issue where ```Month``` starts with ```6``` and ends at ```12``` for every year – Yu Na Jul 16 '20 at 21:55
1

@YuNa in that case, use `last(Month2)` instead of `max(Month2)` – akrun Jul 16 '20 at 21:56
@YuNa can you try with that `last` – akrun Jul 16 '20 at 21:58
Even with ```last``` each year is still starting with the the greater month value. I will post the example – Yu Na Jul 16 '20 at 22:02
@YuNa Here the `filter` is based on two conditions `Year == max(Year) & Month <= last(Month2)`. So, assuming that the 'Year' is numeric, then it will return max Year and Month which is less than or equal to last Month2 value – akrun Jul 16 '20 at 22:04

Jakub.Novotny · Accepted Answer · 2020-07-16T21:42:14.757

0

library(tidyverse)

df <- tibble(
  Country = c("Angola", "Angola", "Benin", "Benin", "Benin"),
  Date = c("1/2008", "6/2020", "1/2013", "6/2020", "7/2014"),
  Year = c(2008, 2020, 2013, 2020, 2014),
  Month = c(1,6,1,6,7))


df %>%
  group_by(Country) %>%
  mutate(Date = lubridate::dmy(paste("1", Date))) %>%
  select(-Month, - Year) %>%
  complete(Date = seq(min(Date), max(Date), by = "months"))

edited Jul 16 '20 at 21:42

answered Jul 16 '20 at 21:26

Jakub.Novotny

2,912
2
6
21

Repeat rows based on time values split across multiple columns - R

2 Answers2

data