0

I am using the Human Mortality Database. It has different values for each week and country of some variables like the deaths per age groups and the ratio. It also considers different countries England and Wales, Scotland and Northen Ireland. Instead, I would like to consider them as a unique country, the UK. How can I compute the sum of these three values for each week and sex? Can somebody help? thanks

This is what I have

# Country   Year   Week   Rate
# GBR_SCO   2000    1     0.01
# GBR_SCO   2000    2     0.02
# GBR_SCO   2000    3     0.03
   ...      ...    ...    ...
# GBR_SCO   2001    1     0.15
# GBR_SCO   2001    2     0.16
# GBR_WAL   2000    1     0.19
# GBR_WAL   2000    2     0.18
# GBR_WAL   2000    3     0.31
   ...      ...    ...    ...
# GBR_WAL   2001    1     0.53
# GBR_WAL   2001    2     0.62

This is what I want to obtain

# Country   Year   Week   Rate
# GBR       2000    1     0.20
# GBR       2000    2     0.20
# GBR       2000    3     0.34
# ...       ...    ...     ...
# GBR       2001    1     0.68
# GBR       2002    2     0.78

Of course consider that in the dataset I have also other countries, years, and weeks that I want to keep. This is an example of what I want to do.

Bibi
  • 87
  • 9
  • 3
    Please, provide a minimal reproducible example: [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – PaulS Jul 13 '22 at 22:38
  • 1
    Summing rates that are based on populations of drastically different sizes seems like a very bad idea. Instead, if you have the data, you should sum the numerators and denominators and re-compute the rates. Or look up the total populations and use a weighted average. – Gregor Thomas Jul 13 '22 at 22:59

1 Answers1

0

I think you need this kind of code (in contrast to your question) my guess is that you need a group_by and summarise: The main challenge is to get group for Country. We use str_extract to extract all character before the underscore:

library(dplyr)
library(stringr)

df %>% 
  mutate(Country = str_extract(Country, "[^_]+")) %>% 
  group_by(Country, Year, Week) %>% 
  summarise(Rate = sum(Rate))
  Country  Year  Week  Rate
  <chr>   <int> <int> <dbl>
1 GBR      2000     1  0.2 
2 GBR      2000     2  0.2 
3 GBR      2000     3  0.34
4 GBR      2001     1  0.68
5 GBR      2001     2  0.78

data:

structure(list(Country = c("GBR_SCO", "GBR_SCO", "GBR_SCO", "GBR_SCO", 
"GBR_SCO", "GBR_WAL", "GBR_WAL", "GBR_WAL", "GBR_WAL", "GBR_WAL"
), Year = c(2000L, 2000L, 2000L, 2001L, 2001L, 2000L, 2000L, 
2000L, 2001L, 2001L), Week = c(1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 
1L, 2L), Rate = c(0.01, 0.02, 0.03, 0.15, 0.16, 0.19, 0.18, 0.31, 
0.53, 0.62)), class = "data.frame", row.names = c(NA, -10L))
TarJae
  • 72,363
  • 6
  • 19
  • 66