I have a panel dataset on disasters per year in one country, with data on GDP per year and casualties and damages in $. For each observation I have the Year and the CPI,that are the same between two observations of the same year and casualties and costs that are observation-specific.
I'd like to modify my dataset so that I have one line per year with the total costs and casualties per year. I cannot simply compute the sum of all obs that have the same year because that would also sum variables such as CPI that is constant across observations of the same year. How can I do this ?
Here are the first 8 rows of my dataset
structure(list(Year = c("1980", "1980", "1980", "1980", "1980",
"1980", "1980", "1980"), Country.Code = c("PHL", "PHL", "PHL",
"PHL", "PHL", "PHL", "PHL", "PHL"), duration = structure(c(1,
1, 1, 1, 1, 1, 1, 1), class = "difftime", units = "days"), `Total Deaths` = c(50,
2, 31, 40, 101, 4, NA, 36), `Total Affected` = c(NA, 25980, 6e+05,
190000, 1004000, 665, 3063, 69237), `Total Damages ('000 US$)` = c(NA,
NA, 15400, NA, 102300, 402, 289, NA), CPI = c(32.2338932328122,
32.2338932328122, 32.2338932328122, 32.2338932328122, 32.2338932328122,
32.2338932328122, 32.2338932328122, 32.2338932328122)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
I have 500+ rows in total with year up to 2019
What I would like is one line per year, with Country.Code and CPI unchanged but duration, death, affected and damages summed per year.
I found this thread and tried to adapt it to my case by doing this
disaster %>%
group_by(Year, Country.Code,CPI) %>%
summarise(
sum_duration = sum(duration),
sum_total_deaths = sum(`Total Deaths`),
sum_total_affected=sum(`Total Affected`),
sum_total_damages=sum(`Total Damages ('000 US$)`)
)
Using the dyplyr package, where disaster is my dataset, Year, Country.Code and CPI are the variables that I do not want to sum, duration is the cumulated length of all disasters per year, measured in days, and all others are self-explanatory.
This does not work.
In case you're wondering what I'll do with this, I will merge this with data on GDP and do some time-series analysis but I already know how to do this.
Any help appreciated ! Thanks