I would like an efficient function or code snippet that tries to subset a vector, and returns NA
if there are no elements in the subset. For example, for
v1 = c(1, 1, NA)
The code unique(v1[!is.na(v1)])
returns one entry which is great, but for
v2 = c(NA, NA, NA)
the code unique(v2[!is.na(v2)])
returns logical(0)
which is not great, when this subsetting operation is used as part of a dplyr
chain containing summarise_each
or summarise
. I would like the second operation to return NA
instead of logical(0)
.
The context behind this is that I am trying to solve this question using multiple spread
commands. Example data taken from the previous question:
set.seed(10)
tmp_dat <- data_frame(
Person = rep(c("greg", "sally", "sue"), each=2),
Time = rep(c("Pre", "Post"), 3),
Score1 = round(rnorm(6, mean = 80, sd=4), 0),
Score2 = round(jitter(Score1, 15), 0),
Score3 = 5 + (Score1 + Score2)/2
)
> tmp_dat
Source: local data frame [6 x 5]
Person Time Score1 Score2 Score3
<chr> <chr> <dbl> <dbl> <dbl>
1 greg Pre 80 78 84.0
2 greg Post 79 80 84.5
3 sally Pre 75 74 79.5
4 sally Post 78 78 83.0
5 sue Pre 81 78 84.5
6 sue Post 82 81 86.5
Now, using multiple spreads we can achieve the desired output (albeit with different column names):
tmp_dat %>%
mutate(Time_2 = Time,
Time_3 = Time) %>%
spread(Time, Score1, sep = '.') %>%
spread(Time_2, Score2, sep = '.') %>%
spread(Time_3, Score3, sep = '.') %>%
group_by(Person) %>%
summarise_each(funs(((function(x)x[!is.na(x)])(.))))
Now, the problem arises if there are too many NA's:
# Replace last two entries in the last row with NA's
tmp_dat$Score2[6] <- NA
tmp_dat$Score3[6] <- NA
Now running the code snippet with the summarise_each
produces the error:
Error in eval(substitute(expr), envir, enclos) : expecting a single value