0

How can i use for loop to sum data by group then break and print the value accumulated sum of A and B respectively? ie:

Type value
A  2
A  NA
A  13  15
B  565
B  245
B  578  1388
library(dplyr)
df %>% 
  group_by(Type) %>% 
  mutate(cs = cumsum(value, na.rm = True)) 

but it only shows the whole table and originally should be sum A should be 15 but eventually become NA.

Type value cs
A     2    2
A     NA   NA
A     13   NA
B     565  565
B     245  810
B     578  1388
DD3
  • 79
  • 4
  • Hi DD3, can you try to be a little more precise? I am not sure what you mean by "but it only shows the whole table and originally should be sum A should be 15 but eventually become NA.". Also, if you provide a reprex, it will improve your chances of quickly getting a more concrete answer. (see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Marcelo Avila Mar 25 '21 at 14:26
  • Also, please include the package you are using for your pipes, consider including more tags, it will make your question more visible – Chriss Paul Mar 25 '21 at 14:43

2 Answers2

1

Using dplyr you can try

library(dplyr)
df %>% 
  group_by(Type) %>%
    mutate(cs = last(sum(value, na.rm = TRUE))) %>%
      mutate(id = row_number()) %>% # Creating a dummy id column
        mutate(cs= replace(cs, id!= max(id),NA)) %>% # replace all rows of cs that are not the last within group Type
          select(-id) # removing id column
#Output
# A tibble: 6 x 3
# Groups:   Type [2]
  Type  value    cs
  <chr> <int> <int>
1 A         2    NA
2 A        NA    NA
3 A        13    15
4 B       565    NA
5 B       245    NA
6 B       578  1388
Chriss Paul
  • 1,101
  • 6
  • 19
1

If I understand correctly, the OP expects that all rows of the new column cs are blank except for the last row of each group where the sum of the values belonging to the group should be printed.

A blank row is only possible if the new column cs is of type character. In case cs is expected to be of numeric type then there is no other choice to print either 0, NA, or any other numeric value, but not "" (empty string).

So, below there are suggestions to create a character column either by using

  • ifelse(), or
  • replace() and rep(), or
  • c() and rep().

in data.table and dplyr syntax, resp.

Note that no for loop is required at all.

data.table

library(data.table)
setDT(df)[, cs := fifelse(1:.N == .N, as.character(sum(value, na.rm = TRUE)), ""), by = Type][]

or

setDT(df)[, cs := replace(rep("", .N), .N, sum(value, na.rm = TRUE)), by = Type][]

or

setDT(df)[, cs := c(rep("", .N - 1L), sum(value, na.rm = TRUE)), by = Type][]
   Type value   cs
1:    A     2     
2:    A    NA     
3:    A    13   15
4:    B   565     
5:    B   245     
6:    B   578 1388

dplyr

library(dplyr)
df %>% 
  group_by(Type) %>% 
  mutate(cs = ifelse(row_number() == n()), sum(value, na.rm = TRUE), ""))

or

df %>% 
  group_by(Type) %>% 
  mutate(cs = replace(rep("", n()), n(), sum(value, na.rm = TRUE)))

or

df %>% 
  group_by(Type) %>% 
  mutate(cs = c(rep("", n() - 1L), sum(value, na.rm = TRUE)))
# A tibble: 6 x 3
# Groups:   Type [2]
  Type  value cs    
  <chr> <int> <chr> 
1 A         2 ""    
2 A        NA ""    
3 A        13 "15"  
4 B       565 ""    
5 B       245 ""    
6 B       578 "1388"
Uwe
  • 41,420
  • 11
  • 90
  • 134