So I have this data frame:
dat1 <- data.frame(id=1:n,
group=rep(LETTERS[1:2], n/2),
age=sample(18:30, n, replace=TRUE),
type=NA,
op=factor(paste0("op", 1:n)),
x=NA)
dat1
dat2 <- data.frame(id=1:n,
group=rep(LETTERS[1:2], n/2),
age=NA,
type=factor(paste0("type", 1:n)),
op=NA,
x=rnorm(n))
dat <- full_join(dat1,dat2) %>% arrange(id)
dat
id group age type op x
1 1 A 19 <NA> op1 NA
2 1 A NA type1 <NA> 0.18819303
3 2 B 29 <NA> op2 NA
4 2 B NA type2 <NA> 0.11916096
5 3 A 19 <NA> op3 NA
6 3 A NA type3 <NA> -0.02509255
7 4 B 28 <NA> op4 NA
8 4 B NA type4 <NA> 0.10807273
9 5 A 27 <NA> op5 NA
10 5 A NA type5 <NA> -0.48543524
11 6 B 26 <NA> op6 NA
12 6 B NA type6 <NA> -0.50421713
This dataset has multiple rows per observation, while some variables are stored in one row and some in another. I like to have it in a tidy format with only one row per id. I can filter into two data frames then rejoin, but there must be an easier way. Summarise would work but only with numeric variables.. The group is fixed with id, there is no same id in different groups
group_by(id) %>%
summarise_if(is.numeric, sum, na.rm=T)
data
It seems there must be a very easy solution but I can't figure it out. Thanks for help!