2

I am trying to conditionally sum across many columns depending on if they are greater than or less than 0. I am surprised I cannot find a dplyr or data.table work around for this. I want to calculate 4 new columns for a large data.frame (columns to calculate are at bottom of post).

dat2=matrix(nrow=10,rnorm(100));colnames(dat2)=paste0('V',rep(1:10))

dat2 %>% as.data.frame() %>%
  rowwise() %>%
  select_if(function(col){mean(col)>0}) %>%
  mutate(sum_pos=rowSums(.))  ##Obviously doesn't work

These are the simple statistics I want to calculate (yes; these apply statements work, but there are other things in my dplyr chain I want to do, so thats why I am looking for a dplyr or data.table way. The columns that are positive or negative for each given row are different, so I cannot grab a list of columns (must be done dynamically, by row).

#Calculate these, but in a dplyr chain?
n_pos=apply(dat2,1,function(x) sum((x>0)))
n_neg=apply(dat2,1,function(x) sum((x<0)))
sum_pos=apply(dat2,1,function(x) sum(x[(x>0)]))
sum_neg=apply(dat2,1,function(x) sum(x[(x<0)]))
K.J.J.K
  • 429
  • 5
  • 12

1 Answers1

2

We don't need rowwise with rowSums as rowSums can do the sum without any groupings

library(dplyr)
dat2 %>%
   as.data.frame() %>%  
   select_if(~ is.numeric(.) && mean(.) > 0) %>% 
   mutate(sum_pos = rowSums(.))

Based on the description, it seems that it is not the mean condition, but related to rowwise, sum of the positive and negative values separately

dat2 %>%
   as.data.frame %>%
   mutate(sum_pos = rowSums(. * NA^(. < 0), na.rm = TRUE),
           sum_neg = rowSums(.[1:10] * NA^(.[1:10] > 0), na.rm = TRUE) )
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This isn't matching the sum_pos vector obtained from `apply` – K.J.J.K Dec 17 '19 at 14:59
  • @K.J.J.K In the `apply`, you are not using the `mean` condition and it is a different condition. Which one are you looking for? – akrun Dec 17 '19 at 15:00
  • My strategy revolved around trying to taking the mean of a column of 1 to use select_if (hence, why I used mean in the select_if statement) – K.J.J.K Dec 17 '19 at 15:03
  • @K.J.J.K Now, it should work with the `apply` solutiion – akrun Dec 17 '19 at 15:03
  • In English, is your statement multiplying by NA if `.` is greater than or less than 0, and then passing that vector (full of numbers and NAs) to rowSums? – K.J.J.K Dec 17 '19 at 15:06
  • @K.J.J.K It is just a way to replace the value that are negative or positive to NA when we do the sum of postive and negative. Another option is `replace` as in the comments – akrun Dec 17 '19 at 15:08