4

I would like to transform a data frame that has both start-year and end-year variables into a complete time series that (1) includes all the years in between start-year and end-year and (2) fills in the values of all the variables for the years in between.

This is how the original data looks like:

data_original <- data.frame(name = c("peter", "peter", "eric", "denisse"), lastname = c("smith", "smith", "jordan", "williams"), age = c(54, 54, 48, 40), start_year = c(1980,1986, 1990, 2000), end_year = c(1984, 1988, 1993, 2001))

data_original
#>      name lastname age start_year end_year
#> 1   peter    smith  54       1980     1984
#> 2   peter    smith  54       1986     1988
#> 3    eric   jordan  48       1990     1993
#> 4 denisse williams  40       2000     2001

This is how I would like the data to look like:

data_final <- data.frame(name = c("peter", "peter", "peter", "peter", "peter", "peter", "peter", "peter", "eric", "eric", "eric", "eric", "denisse", "denisse"), lastname = c("smith", "smith", "smith", "smith", "smith", "smith", "smith", "smith", "jordan", "jordan", "jordan", "jordan", "williams", "williams"), age = c(54, 54, 54, 54, 54, 54, 54, 54, 48, 48, 48, 48, 40, 40), year = c(1980, 1981, 1982, 1983, 1984, 1986, 1987, 1988, 1990, 1991, 1992, 1993, 2000, 2001))

data_final
#>       name lastname age year
#> 1    peter    smith  54 1980
#> 2    peter    smith  54 1981
#> 3    peter    smith  54 1982
#> 4    peter    smith  54 1983
#> 5    peter    smith  54 1984
#> 6    peter    smith  54 1986
#> 7    peter    smith  54 1987
#> 8    peter    smith  54 1988
#> 9     eric   jordan  48 1990
#> 10    eric   jordan  48 1991
#> 11    eric   jordan  48 1992
#> 12    eric   jordan  48 1993
#> 13 denisse williams  40 2000
#> 14 denisse williams  40 2001

Many thanks in advance for this and for your continuous help!

2 Answers2

5

Here is one option with tidyverse. Create 'year' by getting a sequence of 'start_year', 'end_year' with map2, select the relevant columns and unnest

library(tidyverse)
data_original %>% 
    mutate(year = map2(start_year, end_year, `:`)) %>% 
    select(-start_year, -end_year) %>% 
    unnest
#      name lastname age year
#1    peter    smith  54 1980
#2    peter    smith  54 1981
#3    peter    smith  54 1982
#4    peter    smith  54 1983
#5    peter    smith  54 1984
#6    peter    smith  54 1986
#7    peter    smith  54 1987
#8    peter    smith  54 1988
#9     eric   jordan  48 1990
#10    eric   jordan  48 1991
#11    eric   jordan  48 1992
#12    eric   jordan  48 1993
#13 denisse williams  40 2000
#14 denisse williams  40 2001

Or another option is with data.table

library(data.table)
setDT(data_original)[, .(name, lastname, year = seq(start_year, end_year, by = 1)), 
          .(grp = 1:nrow(data_original))][, grp := NULL][] 

Or we could use base R as well with Map

lst <- do.call(Map, c(f = `:`, data_original[4:5]))
out <- data_original[1:3][rep(seq_len(nrow(data_original)), lengths(lst)),]
row.names(out) <- NULL
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Here is another tidyverse approach using seq and unnest:

data_original %>%
    rowwise() %>%
    mutate(year = list(seq(start_year, end_year, 1))) %>%
    ungroup() %>%
    select(-start_year, -end_year) %>%
    unnest()
## A tibble: 14 x 4
#   name    lastname   age  year
#   <fct>   <fct>    <dbl> <dbl>
# 1 peter   smith      54. 1980.
# 2 peter   smith      54. 1981.
# 3 peter   smith      54. 1982.
# 4 peter   smith      54. 1983.
# 5 peter   smith      54. 1984.
# 6 peter   smith      54. 1986.
# 7 peter   smith      54. 1987.
# 8 peter   smith      54. 1988.
# 9 eric    jordan     48. 1990.
#10 eric    jordan     48. 1991.
#11 eric    jordan     48. 1992.
#12 eric    jordan     48. 1993.
#13 denisse williams   40. 2000.
#14 denisse williams   40. 2001.

PS. In hindsight, @akrun's approach using purrr::map2 is much cleaner ; it saves the need for explicit (un)grouping by rows.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68