1

I have a dataset that I want to use for building a Decision Tree in R studio. I have quite a few factors which are empty. I want to change all Factors that are empty in the dataset to "No Data", I have over 100 of these so I don't want to do them one by one, I'd rather be able to change all of them at once.

Example of data (Please note that these are all factors, I know that when it's put into R they are numerics but I don't know how to show factors in a replicated way as I read in the data from a csv):

Outcome=c(1,1,1,0,0,0)
VarA=c(1,1,NA,0,0,NA)
VarB=c(0,NA,1,1,NA,0)
VarC=c(0,NA,1,1,NA,0)
VarD=c(0,1,NA,0,0,0)
VarE=c(0,NA,1,1,NA,NA)
VarF=c(NA,NA,0,1,0,0)
VarG=c(0,NA,1,1,NA,0)
df=as.data.frame(cbind(Outcome, VarA, VarB,VarC,VarD,VarE,VarF,VarG)) 
MLPNPC
  • 454
  • 5
  • 18
  • 2
    Based on your example, variables are all numeric. `replace(df, is.na(df), "No Data")` – akrun Feb 06 '18 at 15:01
  • 2
    Wouldn't this mess up your decision tree calculation? – pogibas Feb 06 '18 at 15:02
  • Possible duplicate of https://stackoverflow.com/questions/8161836/how-do-i-replace-na-values-with-zeros-in-an-r-dataframe – Cris Feb 06 '18 at 15:03
  • @PoGibas has a point here. You should of course only do this with factorial data and not with numeric. – Georgery Feb 06 '18 at 15:04
  • @akrun I don't know how to show an example where the data is Factors? As I read the data in through CSV – MLPNPC Feb 06 '18 at 15:06
  • If you have factor columns, then you may need to add 'No Data' as one of the levels before changing the NA to 'No Data' – akrun Feb 06 '18 at 15:07
  • @akrun would I have to do this for each column separately or can I do it for every factor column at once? – MLPNPC Feb 06 '18 at 15:09
  • If these are factor columns, it is better not to mix the columns, so, try `df[-1] <- lapply(df[-1], function(x) {levels(x) <- c(levels(x), "No Data"); replace(x, is.na(x), "No Data")})` – akrun Feb 06 '18 at 15:11
  • @akrun Thanks! That's worked perfectly, if you put it as an answer I'll mark it off! Thank you! – MLPNPC Feb 06 '18 at 15:14

2 Answers2

2

When we have factor columns and wanted to replace one of the values with a new value, either call the factor again or add the new value as one of the levels of the factor before doing the change. Assuming that we have to recode for variables other than the first column, loop through the columns with lapply, add 'No Data' as one of the levels and then replace the NA elements with "No Data", and finally assign the list output to the columns of interest

df[-1] <- lapply(df[-1], function(x) {
        levels(x) <- c(levels(x), "No Data")
         replace(x, is.na(x), "No Data")
          }) 
akrun
  • 874,273
  • 37
  • 540
  • 662
0

You might try this:

df[is.na(df)] <- "NoData"
Georgery
  • 7,643
  • 1
  • 19
  • 52