4

I have a data frame containing information on incidents for countries around the world. The structure of the data frame is similar to the following example:

a <- data.frame(country = c("AAA" , "BBB" , "CCC") ,
                incident = rep("disaster" , times = 3) ,
                'start year' = c(1990 , 1995 , 2011) ,
                'end year' = c(1993 , 1995 , 2012))

giving a:

  country incident start.year end.year
1     AAA disaster       1990     1993
2     BBB disaster       1995     1995
3     CCC disaster       2011     2012

I would like to transform this so that each row contains the incident for each individual year instead of only the interval. Ideally, it would look like something like this:

  country incident year
1     AAA disaster 1990
2     AAA disaster 1991
3     AAA disaster 1992
4     AAA disaster 1993
5     BBB disaster 1995
6     CCC disaster 2011
7     CCC disaster 2012

Is there an optimal code that can transform this to include a start and end year?

Adrian
  • 791
  • 1
  • 5
  • 15

2 Answers2

4

We may use map2 to get the sequence between the two columns as a list and then unnest the list column

library(dplyr)
library(purrr)
library(tidyr)
a %>%
   transmute(country, incident, year = map2(start.year, end.year, `:`)) %>%
   unnest(year)

-output

# A tibble: 7 × 3
  country incident  year
  <chr>   <chr>    <int>
1 AAA     disaster  1990
2 AAA     disaster  1991
3 AAA     disaster  1992
4 AAA     disaster  1993
5 BBB     disaster  1995
6 CCC     disaster  2011
7 CCC     disaster  2012

If the 'country' column is unique, either use a group by/summarise or use rowwise to expand as well

a %>% 
   group_by(country) %>%
   summarise(incident, year = start.year:end.year, .groups = 'drop')
# A tibble: 7 × 3
  country incident  year
  <chr>   <chr>    <int>
1 AAA     disaster  1990
2 AAA     disaster  1991
3 AAA     disaster  1992
4 AAA     disaster  1993
5 BBB     disaster  1995
6 CCC     disaster  2011
7 CCC     disaster  2012

Or use uncount to expand the data

a %>% 
 uncount(end.year - start.year + 1) %>% 
 group_by(country) %>%
  mutate(year = start.year + row_number() - 1, .keep = 'unused', 
     end.year = NULL) %>% 
  ungroup
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Here is one more alternative using pivot_longer and fill and complete

library(dplyr)
library(tidyr)

a %>% 
  pivot_longer(cols = ends_with("year"),
               values_to = "year") %>% 
  group_by(country) %>% 
  complete(year = full_seq(min(year):max(year), 1)) %>% 
  fill(c(incident)) %>% 
  select(-name)
  country  year incident
  <chr>   <dbl> <chr>   
1 AAA      1990 disaster
2 AAA      1991 disaster
3 AAA      1992 disaster
4 AAA      1993 disaster
5 BBB      1995 disaster
6 BBB      1995 disaster
7 CCC      2011 disaster
8 CCC      2012 disaster
TarJae
  • 72,363
  • 6
  • 19
  • 66