I have dozens of variables, many of which have missing values, including at the first and last observation. I want a new dataset that contains, per person, the first and last observation for each variable, ignoring missings.
The below code does it, but I was hoping 1) there was some function that is similar to head()
, but without having to remove NA
s manually, 2) a way to write a function that dplyr
's summarize_each()
could use to automate across all variables in the dataset (other than id
, of course)
set.seed(23331)
df <- data.frame(id=rep(c(1,2,3,4), each = 5),
a = c(NA, rnorm(4), rnorm(3), rep(NA, 2), rnorm(4), rep(NA, 5), rnorm(1)),
b = c(rep(NA, 2), rnorm(14), rep(NA, 3), rnorm(1)))
df %>% group_by(id) %>% summarise(a.head=head(a[!is.na(a)], n=1),
a.tail=tail(a[!is.na(a)], n=1),
b.head=head(b[!is.na(b)], n=1),
b.tail=tail(b[!is.na(b)], n=1)) %>%
gather("type", "value", -id) %>%
separate(type, into = c("variable", "time"), sep = "\\.") %>%
spread(variable, value)
I'm hoping for a dplyr
solution but would take a base
or data.table
solution if one of those is the optimal way of going about it.
Desired Output:
Source: local data frame [8 x 4]
id time a b
(dbl) (chr) (dbl) (dbl)
1 1 head -0.5877282 0.4975612
2 1 tail -0.7904277 -0.3860010
3 2 head 0.5872134 -0.3923887
4 2 tail -0.3222003 0.3114662
5 3 head -0.2553290 0.7521095
6 3 tail 0.3095699 -0.9113326
7 4 head -0.3809334 1.4752274
8 4 tail -0.3809334 3.2767918