A better way to collapse rows with numerical value and NA

Question

I tried to collapse rows with numerical value and NA and the script worked with a warning. I am wondering if there's a better way to do this without the warning - the current way is also taking a while if I use it on a bigger dataset.

dput(abc)
structure(list(ID = c(12345, 12345, 12345, 23456, 23456, 34567, 
34567, 34567, 45678), cohort_0 = c(10.1, NA, NA, 12, NA, 15.5, 
NA, NA, NA), cohort_2 = c(NA, 10.1, NA, NA, NA, NA, NA, NA, NA
), cohort_7 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), cohort_9 = c(NA, NA, 
NA, NA, 12, NA, NA, NA, NA), cohort_11 = c(NA, NA, NA, NA, NA, 
NA, NA, 15.5, NA)), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame"))

working code

abc2 <- abc %>% group_by(ID) %>%
  summarize_all(~ max(as.character(.), na.rm = TRUE)) %>%
  ungroup

The warning message I get:

> warnings()
Warning messages:
1: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
2: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
3: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
4: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
5: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
6: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
7: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
8: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
9: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
10: In max(as.character(.), na.rm = TRUE) :
  no non-missing arguments, returning NA
11: In max(as.character(.), na.rm = TRUE) :
  no non-missing arguments, returning NA
12: In max(as.character(.), na.rm = TRUE) :
  no non-missing arguments, returning NA
13: In max(as.character(.), na.rm = TRUE) :
  no non-missing arguments, returning NA
14: In max(as.character(.), na.rm = TRUE) :
  no non-missing arguments, returning NA

Update:

So I tried the data.table solution from the other post. It worked on my small data but not on my bigger data. Strange. Any ideas?

var2 <- setDT(var)[, lapply(.SD, na.omit), by = ID]
Error in `[.data.table`(setDT(var), , lapply(.SD, na.omit), by = ID) : 
  Supplied 2 items for column 2 of group 6039 which has 3 rows. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
In addition: Warning message:
In `[.data.table`(setDT(var), , lapply(.SD, na.omit), by = ID) :
  Item 1 of j's result for group 18 is zero length. This will be filled with 2 NAs to match the longest column in this result. Later groups may have a similar problem but only the first is reported to save filling the warning buffer.

See linked post, let us know if it doesn't work. There is a [data.table solution](https://stackoverflow.com/a/28036595/680068) if the speed is the issue. — zx8754, Nov 19 '20 at 10:50
I tried the method. It's strange because it works on my small dput data (abc) but when I applied it on a larger data, it gave me the error. — codedancer, Nov 19 '20 at 11:10
Then please edit your post, link it to the duplicate post saying I tried xyz solutions... show your code and error. Ideally, provide data that would produce the error, and vote to reopen. — zx8754, Nov 19 '20 at 11:12

score 0 · Answer 1 · answered Nov 19 '20 at 11:44

Maybe the following works also for your lager dataset. I'm using %in% to test if the value is present.

x <- sort(unique(unlist(abc[-1])))
sapply(abc[-1], function(y) ifelse(x %in% y, x, NA))
#     cohort_0 cohort_2 cohort_7 cohort_9 cohort_11
#[1,]     10.1     10.1       NA       NA        NA
#[2,]     12.0       NA       NA       12        NA
#[3,]     15.5       NA       NA       NA      15.5

score 0 · Accepted Answer · answered Nov 19 '20 at 13:21

0

The issue is when you only have NAs ("no non-missing arguments"). Here are workarounds using dplyr and data.table:

abc %>% 
  group_by(ID) %>%
  summarize_all(~ if (length(na.omit(.))) max(., na.rm = TRUE) else NA_real_ ) %>%
  ungroup()

setDT(abc)
abc[, 
    lapply(.SD, function(.) if (length(na.omit(.))) max(., na.rm = TRUE) else NA_real_), 
    by = ID]

answered Nov 19 '20 at 13:21

s_baldur

29,441
4
36
69

This is pretty good, especially the data.table solution. It beats the dplyr solution miles ahead! Thanks @sindri_baldur! – codedancer Nov 19 '20 at 15:02

A better way to collapse rows with numerical value and NA

2 Answers2