R dplyr rowMeans with filter

Question

I have seen several posts on getting rowMeans type of result in mutate. For example and dplyr - using mutate() like rowmeans() -- But I want to have another variable act as a filter.

I understand that this data is not tidy, and the "f#" and "d#" variables could be reshaped long, and then cast to "f" and "d", then filter on "f" and summarize "d". But is there a way to do this without reshape? I devised the code below

library(tidyverse)

x<-data.frame(f1=c(1,1), f2=c(1,0), f3=c(1,1),
              d1=c(3,2), d2=c(4,8), d3=c(8,16))
x

x %>%
  rowwise() %>%
  mutate(agg=sum(f1*d1, f2*d2, f3*d3) / sum(f1, f2, f3) )

#Source: local data frame [2 x 7]
#Groups: <by row>

# A tibble: 2 x 7
#     f1    f2    f3    d1    d2    d3   agg
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1  1.00  1.00  1.00  3.00  4.00  8.00  5.00
#2  1.00  0     1.00  2.00  8.00 16.0   9.00

But, I lose the ability to use ranges when there are many variables, so I cannot say "f1*d1":"f2*d2" - is there some more general way?

score 1 · Accepted Answer · answered Jan 17 '18 at 01:16

1

With the assumption that the f columns and the d columns have the same suffix and are equal in length, i.e. same number of f columns and d columns, you can make use of select helper functions:

x %>% 
    select(sort(names(x))) %>%   # sort the names so the f columns and d columns correspond
    mutate(agg = {
        fs = select(., starts_with('f')) 
        ds = select(., starts_with('d'))
        rowSums(fs * ds) / rowSums(fs) 
    })

#  d1 d2 d3 f1 f2 f3 agg
#1  3  4  8  1  1  1   5
#2  2  8 16  1  0  1   9

answered Jan 17 '18 at 01:16

Psidom

209,562
33
339
356

thanks Psidom. It works, but I do not fully understand. fs and ds are selections of columns. Somehow these mini-dataFrame can exist in the { }, but I could not make fs and ds as steps in the pipe - that just adds columns prefixed by fs. or ds. - Does the { } get its own local environment where the . fs and ds dataFrames can all co-exist? – D. Bontempo Jan 17 '18 at 18:50
`{}` simply stands for a block of code, which will execute in order and have one return value assigned to `agg`. `fs` and `ds` are both selected from the original data frame, which is piped in as `.`. The pipe doesn't work because when you do something like `select(., starts_with('f'))`, it loses all the `d` columns, so you can't select `d` columns anymore. So the selection has to happen in parallel, i.e. both having access to the original data frame. Hopefully this makes some sense to you. – Psidom Jan 17 '18 at 18:56

R dplyr rowMeans with filter

1 Answers1