The following is a reprex of my problem written for dplyr:
library(tidyverse)
df <- tibble(State = c("A", "A", "A", "A", "A", "A", "B", "B", "B"),
District_code = c(1:9),
District = c("North", "West", "North West", "South", "East", "South East",
"XYZ", "ZYX", "AGS"),
Population = c(1000000, 2000000, 3000000, 4000000, 5000000, 6000000,
7000000, 8000000, 9000000))
df
#> # A tibble: 9 x 4
#> State District_code District Population
#> <chr> <int> <chr> <dbl>
#> 1 A 1 North 1000000
#> 2 A 2 West 2000000
#> 3 A 3 North West 3000000
#> 4 A 4 South 4000000
#> 5 A 5 East 5000000
#> 6 A 6 South East 6000000
#> 7 B 7 XYZ 7000000
#> 8 B 8 ZYX 8000000
#> 9 B 9 AGS 9000000
For some States, I need to merge Districts using names into fewer geographical categories. In particular, State A should only have: "North - West - North West" and "South - East - South East". Some variables like Population must be added; but others like District_code should acquire NA. I have found this example of operations across rows but it's not quite the same. Grouping doesn't seem to apply.
The final result should be something like this:
new_df
#> # A tibble: 5 x 4
#> State District_code District Population
#> <chr> <int> <chr> <dbl>
#> 1 A NA North - West - North West 5000000
#> 2 A NA South - East - South East 15000000
#> 3 B 7 XYZ 7000000
#> 4 B 8 ZYX 8000000
#> 5 B 9 AGS 9000000
In the real dataframe there are a number of variables like Population that must be added as well as a number of other variables like District_code, which will have to acquire NA values.
Thanks heaps for any help!