How to sum a variable by group with NA?

Question

I have a large data set like this :

ID    Number
153    31
       28
       31
       30
104    31
       30
254    31
266    31

and I want to compute sum by ID include the NA. I mean get this :

    ID   Number
    153   120
    104   61
    254   31
    266   31

I tried aggregate but I dont get the expected result. Some help would be appreciated

Well, after filling the NAs, aggregation is straightforward I guess. I reopened nonetheless. — nicola, Jan 30 '19 at 11:02

akrun · Accepted Answer · 2019-01-30T10:57:22.823

One option is to convert the blanks to NA, then fill replace the NA elements with non-NA adjacent elements above in 'ID', grouped by 'ID', get the sum of 'Number'

library(tidyverse)
df1 %>% 
  mutate(ID = na_if(ID, "")) %>% 
  fill(ID) %>% 
  group_by(ID) %>% 
  summarise(Number = sum(Number))
# A tibble: 4 x 2
#  ID    Number
#  <chr>  <int>
#1 104       61
#2 153      120
#3 254       31
#4 266       31

Or without using fill, create a grouping variable with a logical expression and cumsum, and then do the sum

df1 %>% 
  group_by(grp = cumsum(ID != "")) %>% 
  summarise(ID = first(ID), Number = sum(Number)) %>% 
  select(-grp)

data

df1 <- structure(list(ID = c("153", "", "", "", "104", "", "254", "266"
 ), Number = c(31L, 28L, 31L, 30L, 31L, 30L, 31L, 31L)), row.names = c(NA, 
 -8L), class = "data.frame")

score 1 · Answer 2 · answered Jan 30 '19 at 13:18

1

Or do it straightforwardly :) by

cbind(df1[df1$ID != "", "ID", drop = FALSE], 
      Number = rev(diff(c(0, rev((rev(cumsum(rev(df1$Number)))[df1$ID != ""]))))))

answered Jan 30 '19 at 13:18

Øyvind Langsrud

181
6

How to sum a variable by group with NA?

2 Answers2

data