Split observation (row) into month-wise pieces (rows)

Question

I'm currently working on the civil conflict, and my dataset is the UCDP Armed Conflict Dataset. My focus is on the monthly duration of the civil war. However, I'm having trouble converting the original conflict-year data into conflict-month data.

I'll provide an example of my data below:

conflict_id	start_date	end_date	year	termination
100	1946-05-18	NA	1946	0
100	1946-05-18	1947-03-01	1947	1
101	1950-05-01	1950-07-01	1947	1

I am expecting following result :

conflict_id	year	month	duration	termination
100	1946	5	1	0
100	1946	6	2	0
100	1946	7	3	0
...	...	...	...
100	1947	2	9	0
100	1947	3	10	1

Any suggestions, examples would be greatly appreciated. Thank you in advance for your time and expertise!

I am assuming that your data also contains the country in which the civil year took place. In that case, I would build a new panel dataset of country-month pairs over the whole observation period using a combination of `seq` and `rep`. To separate the date into columns for year, month and day, use the `lubridate` package together with mutate. I would then build two datasets one with the start date and one with the ends dates and match each of them to the respective year-month pair in the new country-month panel data set. — flxflks, May 12 '23 at 06:49
Also, welcome to Stackoverflow! Please consider providing a minimum, reproducible example for future questions. The simplest way is to use dput(). A guide can be found here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — flxflks, May 12 '23 at 06:50

score 1 · Accepted Answer · answered May 12 '23 at 09:13

one approach (rather long 'tidy-style' pipeline so you might want to break it up to inspect which does which):

library(tidyr)
library(lubridate)
library(zoo)

df |> ## df is a dataframe of the example data you provided
  mutate(across(ends_with('_date'),
                ~ as.Date(.x) |> as.yearmon()
                )
         ) |>
  group_by(conflict_id) |>
  summarize(start = min(start_date, na.rm = TRUE),
            end = max(end_date, na.rm = TRUE)
            ) |>
  rowwise() |>
  mutate(ym = seq(start, end, 1/12) |> list()) |>
  unnest_longer(ym) |>
  select(conflict_id, ym) |>
  group_by(conflict_id) |>
  mutate(year = year(ym),
         month = month(ym),
         duration = row_number(),
         termination = ifelse(duration < max(duration), 0, 1)
         )

+ # A tibble: 14 x 6
# Groups:   conflict_id [2]
   conflict_id ym         year month duration termination
         <int> <yearmon> <dbl> <dbl>    <int>       <dbl>
 1         100 Mai 1946   1946     5        1           0
 2         100 Jun 1946   1946     6        2           0
## ... lines removed
11         100 Mär 1947   1947     3       11           1
12         101 Mai 1950   1950     5        1           0
13         101 Jun 1950   1950     6        2           0
14         101 Jul 1950   1950     7        3           1
>

You're welcome! If it worked, please remember to close the ticket by marking the answer as accepted. — I_O, May 19 '23 at 10:10

Split observation (row) into month-wise pieces (rows)

1 Answers1