2

Hi I am trying to find a leave one out average of a variable in all rows using dplyr. Since dplyr provides a convenient function called row_number(), I thought I could use it like this:

library(dplyr)

iris %>% 
  tbl_df %>% 
  select(Sepal.Length) %>%
  mutate(loo_avg=mean(Sepal.Length[-row_number()]))  # leave one out average

But this returns a result like this:

Source: local data frame [150 x 2]

   Sepal.Length loo_avg
          (dbl)   (dbl)
1           5.1     NaN
2           4.9     NaN
3           4.7     NaN
4           4.6     NaN
5           5.0     NaN
6           5.4     NaN
7           4.6     NaN
8           5.0     NaN
9           4.4     NaN
10          4.9     NaN
..          ...     ...

How do you fix this?

Alby
  • 5,522
  • 7
  • 41
  • 51
  • 1
    Perhaps this is what you're looking for: http://stackoverflow.com/questions/35858876/calculate-group-mean-while-excluding-current-observation-using-dplyr/35859197#35859197 – mtoto Apr 01 '16 at 14:48
  • @mtoto That is pretty neat! . But...what if I want to use more complicated function than the average? I was looking if there is a way that uses subsetting.. – Alby Apr 01 '16 at 14:53
  • 1
    see akrun's comment in the linked question. – mtoto Apr 01 '16 at 14:55
  • Deleted my answer. akrun's on the linked question is superior, and also sticks to the `dplyr` tag. – MichaelChirico Feb 24 '22 at 06:57

0 Answers0