0

I have the data frame with a column for years. See below:

D <- as.data.frame(cbind(c(1998,1998,1999,1999,2000,2001,2001), c(1,2,2,5,1,3,4), c(1,5,9,2,NA,7,8)))
colnames(D) <- c('year','var1','var2')
D$start <- D$year*100+1
D$end <- D$year*100+12
print(D)
  year var1 var2  start    end
1 1998    1    1 199801 199812
2 1998    2    5 199801 199812
3 1999    2    9 199901 199912
4 1999    5    2 199901 199912
5 2000    1   NA 200001 200012
6 2001    3    7 200101 200112
7 2001    4    8 200101 200112

I want to copy each row 12 times, one for each month between the start and end columns. I made the start and end columns January and December in this example, but in theory they could be different. Obviously I am really dealing with an incredibly large dataset, so I was wondering how I could do it in one or two lines(preferably using dplyr since that is the coding language I am most used to).

camille
  • 16,432
  • 18
  • 38
  • 60
MathStudent
  • 129
  • 11
  • Not very important, but dplyr isn't a language, it's just an R package that is very popular. That said, once you convert your start & end dates to actual date objects, it will be a likely dupe of [this post](https://stackoverflow.com/q/24803361/5325862) and a few others – camille Dec 11 '19 at 16:25
  • Can you show an example of the expected output? – Anisha Singhal Dec 11 '19 at 16:27

3 Answers3

2

If you want all months for each row, I would do this as a join:

months = expand.grid(year = unique(d$year), month = 1:12)
left_join(D, months, by = "year")

If you want most months for most years, you could filter out the ones you don't want in a next step.

If you really want to use the start and end columns you've created, I would do it like this:

D %>% mutate(month = Map(seq, start, end)) %>%
  tidyr::unnest(cols = month)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
1

We can do expand from tidyr

expand(D, year = unique(year), month = 1:12)  %>%
    left_join(D, by = 'year')
akrun
  • 874,273
  • 37
  • 540
  • 662
0

This also works:

D %>% 
  rowid_to_column() %>%
  gather(key = key, value = date, start, end) %>% 
  select(-key) %>% 
  group_by(rowid) %>% 
  complete(date = full_seq(date, 1)) %>% 
  fill(everything(), -rowid, .direction = "downup") %>% 
  ungroup() %>% 
  arrange(rowid)

If you want to keep the start and end columns add the following before ungroup():

 mutate(start = min(date), end = max(date))
Giovanni Colitti
  • 1,982
  • 11
  • 24