3

I am working on incorporating a variable that is recorded once per unit to a yearly dataset. While it is quite straightforward to repeat the observations n times, I have trouble assigning years to the observations.

The structure of my data is as follows:

id startyear endyear dummy
1  1946      2005    1
2  1957      2005    1
3  1982      2005    1
4  1973      2005    1

What I want to do is to create a new row, called years, which repeats unit 1 n = 2005 - 1946 = 59 times; unit 2 2005-1957 times, and so forth as well as assigning the year, generating the following output:

id startyear endyear dummy year
1  1946      2005    1     1946
1  1946      2005    1     1947
1  1946      2005    1     1948
1  1946      2005    1     1949
[…]

I have attempted to use slice and mutate in dplyr, in combination with rep and seq but neither gives me the result I want. Any help would be greatly appreciated.

VLarsen
  • 67
  • 1
  • 7

2 Answers2

6

We can use map2 to create a sequence from 'startyear' to 'endyear' for each element into a list and then unnest

library(tidyverse)
df1 %>% 
    mutate(year = map2(startyear, endyear, `:`)) %>%
    unnest
# id startyear endyear dummy year
#1    1      1946    2005     1 1946
#2    1      1946    2005     1 1947
#3    1      1946    2005     1 1948
#4    1      1946    2005     1 1949
#5    1      1946    2005     1 1950
#6    1      1946    2005     1 1951
#7    1      1946    2005     1 1952
#...

Or do a group by 'id', mutate into a list and unnest

df1 %>% 
  group_by(id) %>% 
  mutate(year = list(startyear:endyear)) %>% 
  unnest
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Creative, a bit better than the concocted `merge`/`*_join` I was considering. – r2evans Jul 01 '19 at 15:40
  • The latter method works perfectly, thanks! However, I am getting multiple warnings that say ```In startyear:endyear : numerical expression has 2 elements: only the first used```. Any way to work around this? – VLarsen Jul 03 '19 at 09:31
  • @VLarsen I am not get any warnings with the example you posted. Can you show another exampe – akrun Jul 03 '19 at 13:42
4

Less elegant alternative, almost as simple:

library(tidyverse)
df1 %>% 
    uncount(endyear - startyear + 1, .id = "row") %>%
    mutate(year = startyear + row - 1)
Jon Spring
  • 55,165
  • 4
  • 35
  • 53