How to sum a df while ignoring value in R?

Question

This is an extension of my previous question. I reviewed the linked duplicate, but I am still having trouble.

I have a data frame like the following:

> example
  name X1.8 X1.8.1 X1.8.2
1    a   -1      1      7
2    b   33      0      2
3    c    3     10     -1
4    a   -1     -1      4
5    d    5      8      5
6    e    7      6     12
7    a   -1      7      7
8    c    5     20      9
9    f   -1     -1     -1

and I want to collapse(sum) the row with the same name (column 1) but ignore the value -1 while collapsing (summing). *-1 is similar to NA. For example, the example above would become:

> example    # the goal
  name X1.8 X1.8.1 X1.8.2
1    a   -1      8      18 # the first col stays as -1 b/c all are -1
2    b   33      0       2
3    c    8     30       9
4    d    5      8       5
5    e    7      6      12
6    f   -1     -1      -1

> dput(example)
structure(list(name = structure(c(1L, 2L, 3L, 1L, 4L, 5L, 1L, 
3L, 6L), .Label = c("a", "b", "c", "d", "e", "f"), class = "factor"), 
    X1.8 = c(-1, 33, 3, -1, 5, 7, -1, 5, -1), X1.8.1 = c(1, 0, 
    10, -1, 8, 6, 7, 20, -1), X1.8.2 = c(7, 2, -1, 4, 5, 12, 
    7, 9, -1)), row.names = c(NA, 9L), class = "data.frame")

akrun · Accepted Answer · 2020-07-07T21:19:03.790

We can use an if/else after doing the group_by i.e. after grouping by 'name', summarise across all the other columns (dplyr 1.0.0), if all values are -1, then return it or else get the sum of values excluding -1

library(dplyr) # 1.0.0
example %>%
   group_by(name) %>%
   summarise(across(everything(), ~ if(all(.==-1)) -1 else
            sum(.[. != -1], na.rm = TRUE)))
# A tibble: 6 x 4
#  name   X1.8 X1.8.1 X1.8.2
#  <fct> <dbl>  <dbl>  <dbl>
#1 a        -1      8     18
#2 b        33      0      2
#3 c         8     30      9
#4 d         5      8      5
#5 e         7      6     12
#6 f        -1     -1     -1

An option is also to use na_if to replace the -1 to NA and then make use of na.rm= TRUE in sum. But, we have avoided that route in case there are actual NAs in the dataset for a particular group. This would help in identifying the -1 as such

or with summarise_at

example %>%
   group_by(name) %>%
   summarise_at(vars(-group_cols()), ~ if(all(.==-1)) -1 else
           sum(.[. != -1], na.rm = TRUE))
# A tibble: 6 x 4
#  name   X1.8 X1.8.1 X1.8.2
#  <fct> <dbl>  <dbl>  <dbl>
#1 a        -1      8     18
#2 b        33      0      2
#3 c         8     30      9
#4 d         5      8      5
#5 e         7      6     12
#6 f        -1     -1     -1

score 0 · Answer 2 · answered Jul 07 '20 at 21:14

0

This solution could help you:

library(dplyr)
#Format
example[example==-1]<-NA
#Aggregate
example %>% group_by(name) %>% summarise_all(sum,na.rm=T)

# A tibble: 6 x 4
  name   X1.8 X1.8.1 X1.8.2
  <fct> <dbl>  <dbl>  <dbl>
1 a         0      8     18
2 b        33      0      2
3 c         8     30      9
4 d         5      8      5
5 e         7      6     12
6 f         0      0      0

answered Jul 07 '20 at 21:14

Duck

39,058
13
42
84

this is great ! but I would like to keep `-1` because it indicates NA values; here, you have changes to 0 – Math Avengers Jul 07 '20 at 21:26

score 0 · Answer 3 · answered Jul 07 '20 at 21:53

0

base R

aggregate(x = example[,2:4],
          by = list(name = example$name),
          FUN = function(x)ifelse(all(x==-1), -1, sum(x[x!=-1])))

answered Jul 07 '20 at 21:53

Jan Marvin

426
1
4
5

How to sum a df while ignoring value in R?

3 Answers3