0

I would like to add a new column that is titled "pop sum" which takes the sum of all rows that has population data and is grouped by location. I tried group_by with Building and then sum = sum(pop) but that did work. I will just show one building but there are 30+ UNIQUE buildings Reference dataframe below:

`Date Ran`           Year Month      Building   Population MonthN
  <dttm>              <dbl> <chr>        <fct>       <num>    <chr> 
1 2018-09-28 00:00:00  2018 September     ALEX       1196    Sep   
2 2018-08-30 00:00:00  2018 August        ALEX       1172    Aug   
3 2018-07-19 00:00:00  2018 July          ALEX       1171    Jul   
4 2018-06-30 00:00:00  2018 June          ALEX       1167    Jun   
5 2018-05-11 00:00:00  2018 May           ALEX       1154    May   
6 2018-04-09 00:00:00  2018 April         ALEX       1154    Apr 

Desired output would like something like this:

`Date Ran`           Year Month      Building   Population   MonthN  Pop Sum
  <dttm>              <dbl> <chr>         <fct>      <dbl>     <chr>      <num>
1 2018-09-28 00:00:00  2018 September     ALEX       1196       Sep        7014
2 2018-08-30 00:00:00  2018 August        ALEX       1172       Aug        7014
3 2018-07-19 00:00:00  2018 July          ALEX       1171       Jul        7014
4 2018-06-30 00:00:00  2018 June          ALEX       1167       Jun        7014
5 2018-05-11 00:00:00  2018 May           ALEX       1154       May        7014
6 2018-04-09 00:00:00  2018 April         ALEX       1154       Apr        7014

A function or for loop would be great but if there are any R packages I could load in to generate new column that would be great

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Dinho
  • 704
  • 4
  • 15
  • `group_by(Building) %>% summarize(pop_sum = sum(Population))` should work, can you share your code? – Adam B. Feb 18 '20 at 21:35
  • 2
    What exactly do you mean the group_by/sum "didn't work"? Did you use `dplyr`? Did you use a `summarize()` or `mutate()`? I would have thought the latter should have worked just fine. When asking for help please share your data in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that makes it easy to copy/paste data into R (this is not easy with your current data). Also, try including data with more than one Building to make sure we properly test the grouping. – MrFlick Feb 18 '20 at 21:36
  • My apologizes - I will change the question format. I used the dplyr but gave the sum of all the population total - not seperated by building. – Dinho Feb 18 '20 at 21:39
  • @Dinho Are you sure you used a `group_by`? What you describe doesn't make sense. Please show your code. – MrFlick Feb 18 '20 at 21:40

3 Answers3

3

In dplyr, you usually are using summary functions to get another output. However, with group and ungroup, you can add a summary column.

 newdf <- df %>% 
    group_by(Building) %>% 
    mutate(PopSum = sum(Population, na.rm=TRUE)) %>% 
    ungroup()
jessi
  • 1,438
  • 1
  • 23
  • 36
2

Here is an example with data.table

data:

library(data.table)

data <- data.table(building = c("Alex", "Alex", "Mike"),
               population = c(1312, 3123, 2139),
               location = c("Denver", "Arizona", "Detroit"))

   building population location
1:     Alex       1312   Denver
2:     Alex       3123  Arizona
3:     Mike       2139  Detroit

code:

data[, popsum := sum(population), by = building][]

   building population location popsum
1:     Alex       1312   Denver   4435
2:     Alex       3123  Arizona   4435
3:     Mike       2139  Detroit   2139

Is that what you're looking for?

Gainz
  • 1,721
  • 9
  • 24
0

Here is a base R solution using ave, maybe you can have a try with it

df <- within(df,PopSum <- ave(Population,Building,FUN = sum))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81