replace na values in full dataset using r

Question

I am working on a dataset with few missing values marked as "?", I have to replace them with the most common value(mode) of that column. But, I want to write a code which runs it for the whole dataset at once.

I have gotten so far -

df <- read.csv("mushroom.txt", na.strings = "?",header=FALSE)

Now, trying to replace all the NA values in the file with the mode of that column. Please help.

I think there are a lot of similar questions, start with give a look at [here](http://stackoverflow.com/questions/8161836/how-do-i-replace-na-values-with-zeros-in-r) or at least provide a minimal example code. — SabDeM, Jun 24 '15 at 18:34

score 1 · Answer 1 · answered Jun 24 '15 at 18:52

replaceQuestions <- function(vector) {

  mostCommon <- names(sort(table(vector), decreasing = TRUE))[1]

  vector[vector == '?'] <- mostCommon

  vector

}

df <- apply(df, 2, replaceQuestions)

Not reproducible so I'm not sure if this is what you were looking for, but this solves the problem as I've interpreted it.

score 1 · Accepted Answer · answered Jun 24 '15 at 18:53

Since you want to replace by the mode of a column you want to operate in a column-wise fashion via apply and use is.na to identify those columns that you want to replace.

apply(df, 2, function(x){ 
    x[is.na(x)] <- names(which.max(table(x)))
    return(x) })

Note that apply returns a matrix, so if you want a data.frame you would need to convert with as.data.frame

PavoDive · Answer 3 · 2015-06-24T19:21:29.720

As you have it in your question, you're replacing NAs with "?" during your csv-reading, so I think this could help:

apply(df,2,function(x) gsub("\\?",names(sort(-table(x,exclude="?")))[1],x))

The exclude part is to avoid selecting the "?", shall it be the most frequent value. The \\ is to escape the special character ? to gsub.

====== EDIT TO ADD ======

gsub will convert everything to text, you'll need to make it back to numeric again:

a<-apply(df,2,function(x) gsub("\\?",names(sort(-table(x,exclude="?")))[1],x))
new_df<-as.data.frame(apply(a,2,as.numeric))

Last line will produce a new data frame

score 0 · Answer 4 · edited May 23 '17 at 11:51

0

Or:

apply(df, 2, function(x) {
  x[is.na(x)] <- Mode(x[complete.cases(x)])
  x})

This uses the well-known Mode function on SO. Link to the function Is there a built-in function for finding the mode?

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

edited May 23 '17 at 11:51

Community

1
1

answered Jun 24 '15 at 19:17

Pierre L

28,203
6
47
69

score 0 · Answer 5 · edited Jul 03 '15 at 06:24

0

use

for (i in ncol(dataframename){
   dataframename[i]=
   ifelse(is.na(dataframename[i]),mode(dataframename[i]),dataframename[i])
}

edited Jul 03 '15 at 06:24

toy

11,711
24
93
176

answered Jul 03 '15 at 04:08

Ajay Ohri

3,382
3
30
60

replace na values in full dataset using r

5 Answers5