3

I have a dataset dt and I want to replace the NA values with the mode of each attribute based on the id as follow:

Before:

 id  att  
  1  v
  1  v
  1  NA
  1  c
  2  c
  2  v
  2  NA
  2  c

The outcome I am looking for is:

 id  att
  1  v
  1  v
  1  v
  1  c
  2  c
  2  v
  2  c
  2  c

I have done some attempts for example I found another similar question which wanted to replace the NA with mean (which has a built in function), therefore I tried to adjust the code as follow:

for (i in 1:dim(dt)[1]) {
    if (is.na(dt$att[i])) {
      att_mode <-                  # I am stuck here to return the mode of an attribute based on ID
      dt$att[i] <- att_mode 
    }
  }

I found the following function to calculate the mode

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

from the following link: Is there a built-in function for finding the mode?

But I have no idea how to apply it inside the for loop, I tried apply, ave functions but they do not seem to be the right choice because of the different dimensions.

Could anyone help on how to return the mode in my for loop?

Thank you

Community
  • 1
  • 1

1 Answers1

2

We can use na.aggrgate from library(zoo), specify the FUN as Mode. If this is a group by operation, we can do this using data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', we apply the na.aggregate

library(data.table)
library(zoo)
setDT(df1)[, att:= na.aggregate(att, FUN=Mode), by = id]
df1
#    id att
#1:  1   v
#2:  1   v
#3:  1   v
#4:  1   c
#5:  2   c
#6:  2   v
#7:  2   c
#8:  2   c

A similar option with dplyr

library(dplyr)
df1 %>%
     group_by(id) %>%
     mutate(att = na.aggregate(att, FUN=Mode))

NOTE: Mode from OP's post. Also, assuming that the 'att' is character class.

akrun
  • 874,273
  • 37
  • 540
  • 662
  • how can I do this for more than one column using data.table? – Chris Feb 29 '20 at 16:41
  • @Chris For that use `setDT(df1)[, (nm1) := na.aggregate(.SD, FUN = Mode), by = id, .SDcols = nm1]` the `nm1` is either the vector of column index or column names – akrun Feb 29 '20 at 17:25