Create new column that takes the sum of another column values and group by condition in R

Question

I would like to add a new column that is titled "pop sum" which takes the sum of all rows that has population data and is grouped by location. I tried group_by with Building and then sum = sum(pop) but that did work. I will just show one building but there are 30+ UNIQUE buildings Reference dataframe below:

`Date Ran`           Year Month      Building   Population MonthN
  <dttm>              <dbl> <chr>        <fct>       <num>    <chr> 
1 2018-09-28 00:00:00  2018 September     ALEX       1196    Sep   
2 2018-08-30 00:00:00  2018 August        ALEX       1172    Aug   
3 2018-07-19 00:00:00  2018 July          ALEX       1171    Jul   
4 2018-06-30 00:00:00  2018 June          ALEX       1167    Jun   
5 2018-05-11 00:00:00  2018 May           ALEX       1154    May   
6 2018-04-09 00:00:00  2018 April         ALEX       1154    Apr

Desired output would like something like this:

`Date Ran`           Year Month      Building   Population   MonthN  Pop Sum
  <dttm>              <dbl> <chr>         <fct>      <dbl>     <chr>      <num>
1 2018-09-28 00:00:00  2018 September     ALEX       1196       Sep        7014
2 2018-08-30 00:00:00  2018 August        ALEX       1172       Aug        7014
3 2018-07-19 00:00:00  2018 July          ALEX       1171       Jul        7014
4 2018-06-30 00:00:00  2018 June          ALEX       1167       Jun        7014
5 2018-05-11 00:00:00  2018 May           ALEX       1154       May        7014
6 2018-04-09 00:00:00  2018 April         ALEX       1154       Apr        7014

A function or for loop would be great but if there are any R packages I could load in to generate new column that would be great

`group_by(Building) %>% summarize(pop_sum = sum(Population))` should work, can you share your code? — Adam B., Feb 18 '20 at 21:35
What exactly do you mean the group_by/sum "didn't work"? Did you use `dplyr`? Did you use a `summarize()` or `mutate()`? I would have thought the latter should have worked just fine. When asking for help please share your data in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that makes it easy to copy/paste data into R (this is not easy with your current data). Also, try including data with more than one Building to make sure we properly test the grouping. — MrFlick, Feb 18 '20 at 21:36
My apologizes - I will change the question format. I used the dplyr but gave the sum of all the population total - not seperated by building. — Dinho, Feb 18 '20 at 21:39
@Dinho Are you sure you used a `group_by`? What you describe doesn't make sense. Please show your code. — MrFlick, Feb 18 '20 at 21:40

jessi · Accepted Answer · 2020-02-18T21:51:13.560

3

In dplyr, you usually are using summary functions to get another output. However, with group and ungroup, you can add a summary column.

 newdf <- df %>% 
    group_by(Building) %>% 
    mutate(PopSum = sum(Population, na.rm=TRUE)) %>% 
    ungroup()

edited Feb 18 '20 at 21:51

answered Feb 18 '20 at 21:35

jessi

1,438
1
23
36

Gainz · Answer 2 · 2020-02-18T21:59:31.627

Here is an example with data.table

data:

library(data.table)

data <- data.table(building = c("Alex", "Alex", "Mike"),
               population = c(1312, 3123, 2139),
               location = c("Denver", "Arizona", "Detroit"))

   building population location
1:     Alex       1312   Denver
2:     Alex       3123  Arizona
3:     Mike       2139  Detroit

code:

data[, popsum := sum(population), by = building][]

   building population location popsum
1:     Alex       1312   Denver   4435
2:     Alex       3123  Arizona   4435
3:     Mike       2139  Detroit   2139

Is that what you're looking for?

Thank you! This is exactly it and then some! – Dinho Feb 18 '20 at 23:45 — Dinho, Feb 18 '20 at 23:45

score 0 · Answer 3 · answered Feb 18 '20 at 21:48

0

Here is a base R solution using ave, maybe you can have a try with it

df <- within(df,PopSum <- ave(Population,Building,FUN = sum))

answered Feb 18 '20 at 21:48

ThomasIsCoding

96,636
9
24
81

Create new column that takes the sum of another column values and group by condition in R

3 Answers3