1

I am wondering how I can find the sum of a column, (in this case it's the AgeGroup_20_to_24 column) for a month and year. Here's the sample data:

https://i.stack.imgur.com/E23Th.png

I essentially want to find the total amount of cases per month/year. For an example: 01/2020 = total sum cases of the AgeGroup 02/2020 = total sum cases of the AgeGroup

I tried doing this, however I get this:

https://i.stack.imgur.com/1eH0O.png

xAge20To24 <- covid%>%
  mutate(dates=mdy(Date), year = year(dates), month = month(dates))%>%
  mutate(total = sum(AgeGroup_20_to_24))%>%
  select(Date, year, month, AgeGroup_20_to_24)%>%
  group_by(year)

View(xAge20To24)
  

Any help will be appreciated.

structure(list(Date = c("3/9/2020", "3/10/2020", "3/11/2020", "3/12/2020", "3/13/2020", "3/14/2020"), AgeGroup_0_to_19 = c(1, 0, 2, 0, 0, 2), AgeGroup_20_to_24 = c(1, 0, 2, 0, 2, 1), AgeGroup_25_to_29 = c(1, 0, 1, 2, 2, 2), AgeGroup_30_to_34 = c(0, 0, 2, 3, 4, 3), AgeGroup_35_to_39 = c(3, 1, 2, 1, 2, 1), AgeGroup_40_to_44 = c(1, 2, 1, 3, 3, 1), AgeGroup_45_to_49 = c(1, 0, 0, 2, 0, 1), AgeGroup_50_to_54 = c(2, 1, 1, 1, 0, 1), AgeGroup_55_to_59 = c(1, 0, 1, 1, 1, 2), AgeGroup_60_to_64 = c(0, 2, 2, 1, 1, 3), AgeGroup_70_plus = c(2, 0, 2, 0, 0, 0)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

Jase C
  • 17
  • 5
  • 2
    Please do not post (only) an image of code/data/errors: it breaks screen-readers and it cannot be copied or searched (ref: https://meta.stackoverflow.com/a/285557 and https://xkcd.com/2116/). Please include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly. – r2evans Nov 03 '22 at 22:53
  • Hello, sorry about that.. Should I send the csv file here then? (I'm pretty new to stackoverflow) – Jase C Nov 03 '22 at 22:59
  • See https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info for demonstrations/discussion on using `dput(head(covid))` or `read.table(text="...")`. – r2evans Nov 03 '22 at 23:08
  • 1
    entering dput(x) or dput(head(x)) like @r2evans described will produce a block of text you paste directly into your question. It will let others reproduce that bit of your data to help. – Seth Nov 03 '22 at 23:08
  • Sorry! I just edited it. – Jase C Nov 03 '22 at 23:32

1 Answers1

1

I'm not sure if your question and your data match up. You're asking for by-month summaries of data, but your data only includes March entries. I've provided two examples of summarizing your data below, one that uses the entire date and one that uses by-day summaries since we can't use month. If your full data set has more months included, you can just swap the day for month instead. First, a quick summary of just the dates can be done with this code:

#### Load Library ####
library(tidyverse)
library(lubridate)

#### Pivot and Summarise Data ####
covid %>%   
  pivot_longer(cols = c(everything(),
                        -Date),
               names_to = "AgeGroup",
               values_to = "Cases") %>% 
  group_by(Date) %>% 
  summarise(Sum_Cases = sum(Cases))

This pivots your data into long format, groups by the entire date, then summarizes the cases, which gives you this by-date sum of data:

# A tibble: 6 × 2
  Date      Sum_Cases
  <chr>         <dbl>
1 3/10/2020         6
2 3/11/2020        16
3 3/12/2020        14
4 3/13/2020        15
5 3/14/2020        17
6 3/9/2020         13

Using the same pivot_longer principle, you can mutate the data to date format like you already did, pivot to longer format, then group by day, thereafter summarizing the cases:

#### Theoretical Example ####
covid %>%
  mutate(Date=mdy(Date), 
         Year = year(Date),
         Month = month(Date),
         Day = day(Date)) %>% 
  pivot_longer(cols = c(everything(),
                        -Date,-Year,-Month,-Day),
               names_to = "AgeGroup",
               values_to = "Cases") %>% 
  group_by(Day) %>% # use by day instead of month
  summarise(Sum_Cases = sum(Cases))

Which you can see below. Here we can see the 14th had the most cases:

# A tibble: 6 × 2
    Day Sum_Cases
  <int>     <dbl>
1     9        13
2    10         6
3    11        16
4    12        14
5    13        15
6    14        17
Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30
  • 1
    Thank you so much! This worked. Also I can see why everyone in the comments told me to provide the actual data haha... The data was sorted and I only printed the head. Anyways, thank you so much again for the help!! – Jase C Nov 04 '22 at 00:40
  • 1
    No worries. Happy coding Jase. – Shawn Hemelstrand Nov 04 '22 at 00:41