0

Here is what my dataframe looks like:

> head(full_malaria_data)
  X index     lonx      laty District population malaria_cases river_length distance_to_river distance_to_coast
1 0     1 6.470243 0.2406650    Lemba         10             0     3098.054          136.9634         210.53670
2 1     2 6.474831 0.2397604    Lemba        395            23    15498.375          240.7952         214.72492
3 2     3 6.460882 0.2677222    Lemba       1862             8    13198.230          583.5622          65.33937
4 3     4 6.471704 0.2500610    Lemba        302             0    13198.230          231.2028         523.73073

I am trying to find the total number of malaria cases per district (for Lemba, Canta Calo, Principe, Agua Grande).

So far my code does one district at a time:

full_malaria_data %>% 
  summarise(out = sum(malaria_cases[District == "Lobata"])) %>%
  pull(out)

How can I augment my code so that it finds the sum of malaria cases for all districts at once?

Natasha H
  • 59
  • 7
  • 2
    use `group_by(District) %>% summarise(out = sum(malaria_cases, na.rm = TRUE))` – akrun Nov 30 '21 at 20:16
  • 1
    Several questions on SO exist with excellent and well-filled-out answers, on the theme of *"calculate ... per group"*. Often it's "mean", but it can easily be `sum`, `any`, or some user-defined function. In this case, I think it's simply `sum`. Please read the dupe-links for many ways to do this, including base R, `dplyr`, `data.table`, and perhaps `sqldf` (to name several). – r2evans Nov 30 '21 at 20:23

0 Answers0