I tried to collapse rows with numerical value and NA and the script worked with a warning. I am wondering if there's a better way to do this without the warning - the current way is also taking a while if I use it on a bigger dataset.
dput(abc)
structure(list(ID = c(12345, 12345, 12345, 23456, 23456, 34567,
34567, 34567, 45678), cohort_0 = c(10.1, NA, NA, 12, NA, 15.5,
NA, NA, NA), cohort_2 = c(NA, 10.1, NA, NA, NA, NA, NA, NA, NA
), cohort_7 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), cohort_9 = c(NA, NA,
NA, NA, 12, NA, NA, NA, NA), cohort_11 = c(NA, NA, NA, NA, NA,
NA, NA, 15.5, NA)), row.names = c(NA, -9L), class = c("tbl_df",
"tbl", "data.frame"))
working code
abc2 <- abc %>% group_by(ID) %>%
summarize_all(~ max(as.character(.), na.rm = TRUE)) %>%
ungroup
The warning message I get:
> warnings()
Warning messages:
1: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
2: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
3: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
4: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
5: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
6: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
7: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
8: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
9: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
10: In max(as.character(.), na.rm = TRUE) :
no non-missing arguments, returning NA
11: In max(as.character(.), na.rm = TRUE) :
no non-missing arguments, returning NA
12: In max(as.character(.), na.rm = TRUE) :
no non-missing arguments, returning NA
13: In max(as.character(.), na.rm = TRUE) :
no non-missing arguments, returning NA
14: In max(as.character(.), na.rm = TRUE) :
no non-missing arguments, returning NA
Update:
So I tried the data.table
solution from the other post. It worked on my small data but not on my bigger data. Strange. Any ideas?
var2 <- setDT(var)[, lapply(.SD, na.omit), by = ID]
Error in `[.data.table`(setDT(var), , lapply(.SD, na.omit), by = ID) :
Supplied 2 items for column 2 of group 6039 which has 3 rows. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
In addition: Warning message:
In `[.data.table`(setDT(var), , lapply(.SD, na.omit), by = ID) :
Item 1 of j's result for group 18 is zero length. This will be filled with 2 NAs to match the longest column in this result. Later groups may have a similar problem but only the first is reported to save filling the warning buffer.