1

I'm currently working on the civil conflict, and my dataset is the UCDP Armed Conflict Dataset. My focus is on the monthly duration of the civil war. However, I'm having trouble converting the original conflict-year data into conflict-month data.

I'll provide an example of my data below:

conflict_id start_date end_date year termination
100 1946-05-18 NA 1946 0
100 1946-05-18 1947-03-01 1947 1
101 1950-05-01 1950-07-01 1947 1

I am expecting following result :

conflict_id year month duration termination
100 1946 5 1 0
100 1946 6 2 0
100 1946 7 3 0
... ... ... ...
100 1947 2 9 0
100 1947 3 10 1

Any suggestions, examples would be greatly appreciated. Thank you in advance for your time and expertise!

I_O
  • 4,983
  • 2
  • 2
  • 15
ryang6476
  • 13
  • 2
  • I am assuming that your data also contains the country in which the civil year took place. In that case, I would build a new panel dataset of country-month pairs over the whole observation period using a combination of `seq` and `rep`. To separate the date into columns for year, month and day, use the `lubridate` package together with mutate. I would then build two datasets one with the start date and one with the ends dates and match each of them to the respective year-month pair in the new country-month panel data set. – flxflks May 12 '23 at 06:49
  • Also, welcome to Stackoverflow! Please consider providing a minimum, reproducible example for future questions. The simplest way is to use dput(). A guide can be found here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – flxflks May 12 '23 at 06:50

1 Answers1

1

one approach (rather long 'tidy-style' pipeline so you might want to break it up to inspect which does which):

library(tidyr)
library(lubridate)
library(zoo)

df |> ## df is a dataframe of the example data you provided
  mutate(across(ends_with('_date'),
                ~ as.Date(.x) |> as.yearmon()
                )
         ) |>
  group_by(conflict_id) |>
  summarize(start = min(start_date, na.rm = TRUE),
            end = max(end_date, na.rm = TRUE)
            ) |>
  rowwise() |>
  mutate(ym = seq(start, end, 1/12) |> list()) |>
  unnest_longer(ym) |>
  select(conflict_id, ym) |>
  group_by(conflict_id) |>
  mutate(year = year(ym),
         month = month(ym),
         duration = row_number(),
         termination = ifelse(duration < max(duration), 0, 1)
         )
+ # A tibble: 14 x 6
# Groups:   conflict_id [2]
   conflict_id ym         year month duration termination
         <int> <yearmon> <dbl> <dbl>    <int>       <dbl>
 1         100 Mai 1946   1946     5        1           0
 2         100 Jun 1946   1946     6        2           0
## ... lines removed
11         100 Mär 1947   1947     3       11           1
12         101 Mai 1950   1950     5        1           0
13         101 Jun 1950   1950     6        2           0
14         101 Jul 1950   1950     7        3           1
> 
I_O
  • 4,983
  • 2
  • 2
  • 15
  • You're welcome! If it worked, please remember to close the ticket by marking the answer as accepted. – I_O May 19 '23 at 10:10