0

I have a dataframe with date ranges in it, from which I'd like to create new rows representing each year encompassed by that range (including starting and ending year). It looks like this:

id      start      end  
1      2000         2004  
2      2005         2005  
3      2005         2007  
4      2001         2002 

Where 'id' is a factor, 'start' and 'end' are dates.

But I need to expand the dataframe to look like this:

id      year        
1       2000 
1       2001
1       2002
1       2003 
1       2004
2       2005
3       2005
3       2006
3       2007
4       2001
4       2002

I've tried the approaches suggested here: Expand rows by date range using start and end date and here Generate rows between two dates in a dataframe. Specifically I ran:

library(data.table)
setDT(df)[, .(year = seq.Date(start, end, by = '1 year')), by = 'id']

And also tried the dplyr approach:

library(dplyr)
library(purrr)
df_expanded <- df %>%
  transmute(id, year = map2(start, end, seq, by = "year")) %>%
  unnest %>% 
  distinct

Both attempts resulted in a similar error:

Error in seq.int(r1$year, to0$year, by) : wrong sign in 'by' argument

I have looked but I can't figure out why I am getting this error. I should mention that this error also happens with the full-dates in format YYYY-MM-DD. I'm not interested in the monthly or daily differences so I reformatted these to be YYYY only, but this code is still returning the error message.

Can anyone please help?

b2497
  • 3
  • 1

1 Answers1

1

sample data

library(data.table)
DT <- fread("id      start      end  
1      2000         2004  
2      2005         2005  
3      2005         2007  
4      2001         2002")

code

year is numeric (and not a date), so you can create a vector from start to end, by id.

DT[, .(year = start:end), by = .(id)][]

output

#     id year
#  1:  1 2000
#  2:  1 2001
#  3:  1 2002
#  4:  1 2003
#  5:  1 2004
#  6:  2 2005
#  7:  3 2005
#  8:  3 2006
#  9:  3 2007
# 10:  4 2001
# 11:  4 2002
Wimpel
  • 26,031
  • 1
  • 20
  • 37