I'm having trouble to exclude missing values in summarise_all
function.
I have a dataset (df) as shown below and basically I'm having two problems:
- excluding missing values and the output only being one number
- additional data rows with same IDs but NA values (the second column with 'TRUE' values in df1 dataset)
df1 dataset is the one I'm trying to get to.
Here's the whole enchilada:
df #the original dataset
ID type of data genes1 genes2 genes3 ...
1 new 2 NA NA
1 old NA 0 NA
1 suggested NA NA 2
2 new 1 NA NA
2 old NA 1 NA
2 suggested NA NA 1
...
df1 <- df %>% group_by(df$ID) %>% summarize_all(list, na.rm= TRUE) #my code
#output
ID type of data genes1 genes2 genes3 ...
1 c("new","old","suggested") c(2,NA,NA) c(0,NA,NA) c(2,NA,NA)
1 TRUE TRUE TRUE TRUE
2 c("new","old","suggested") c(1,NA,NA) c(1,NA,NA) c(1,NA,NA)
2 TRUE TRUE TRUE TRUE
...
#my main concern is the "genes" type of data and the rows with same IDs and NA values, I wanted something like this
df1 #dream dataset
ID type of data genes1 genes2 genes3 ...
1 #doesn't matter 2 0 2
2 #doesn't matter 1 1 1
...
I also tried using na.omit
in summarise_all
but it didn't really fix anything.
Does anybody have any ideas on how to fix it?