5

In a recent question i tried to give an answer using dplyr::coalesce to replace NA with a grouped median. But I got an

Error: Argument 2 must be an integer vector, not a double vector

error. Trying to figure out what was the cause for this I finally got to point where it looks like the error appears only if nrow(df) is an un-even number? I somewhat doubtful that this is really the explanation but that's the moment I decided to ask the question here: What is the reason for this? The only related issue i found was here but I'm not sure if this is the same problem?

Edit:

The error is not raised if I replace median with min or max!

MRE:

library(dplyr)
df <- data.frame(ID = 1:7,
                 Group = c(1, 1, 1, 2, 2, 2, 1),
                 val1 = c(1, NA, 3, 2, 2, 3, 2),
                 val2 = c(2, 2, 2, NA, 1, 3, 2))

df %>%
  group_by(Group) %>% 
  mutate_at(vars(-group_cols()), ~coalesce(., median(.,na.rm=TRUE))) %>% 
  ungroup()

Raises:

Error: Argument 2 must be an integer vector, not a double vector

But if I remove the last row (or the three last rows):

df[1:6, ] %>%
  group_by(Group) %>% 
  mutate_at(vars(-group_cols()), ~coalesce(., median(.,na.rm=TRUE))) %>% 
  ungroup()

It works....!!?

P.S.
Using ifelse(is.na(.)... instead of coalesce works also independently of the number of rows:

df %>%
group_by(Group) %>% 
mutate_at(vars(-group_cols()), ~ifelse(is.na(.), median(., na.rm = TRUE), .)) %>% 
ungroup()

P.P.S The error is also raised when using mean instead of median

dario
  • 6,415
  • 2
  • 12
  • 26
  • 1
    Maybe this is relevant from the ```median``` documentation: "The default method returns a length-one object of the same type as x, except when x is logical or integer of even length, when the result will be double." – xilliam Mar 07 '20 at 10:10

1 Answers1

3

The median documentation says

The default method returns a length-one object of the same type as x, except when x is logical or integer of even length, when the result will be double."

And the error you see is not thrown if df$ID is set to as.numeric. Suggests coalesce is getting confused by the df$ID class.

library(dplyr)
df <- data.frame(ID = 1:7,
  Group = c(1, 1, 1, 2, 2, 2, 1),
  val1 = c(1, NA, 3, 2, 2, 3, 2),
  val2 = c(2, 2, 2, NA, 1, 3, 2))

# convert ID to numeric
df$ID <- as.numeric(df$ID)

df %>%
  group_by(Group) %>% 
  mutate_at(vars(-group_cols()), ~coalesce(., median(.,na.rm=TRUE))) %>% 
  ungroup()

Notice also how the class of ID can vary depending on how it is input:

IDa = 1:7
class(IDa)

IDb = c(1,2,3,4,5,6,7)
class(IDb)

IDc = c(1L,2L,3L,4L,5L,6L,7L)
class(IDc)
dario
  • 6,415
  • 2
  • 12
  • 26
xilliam
  • 2,074
  • 2
  • 15
  • 27
  • Thank you, xilliam. This explains the weird dependency on number of elements! I really should have read `?median` more carefully! Thank you for helping me out ;) – dario Mar 07 '20 at 10:43
  • I commend your sharp eyes to spot the dependency ;) – xilliam Mar 07 '20 at 10:59