1

So I have data that has None interspersed in numeric vectors, like this -

Lot.Frontage    Lot.Area
34                  3901
70                  8400
60                  7200
64                  7018
111                 16259
50                  4280
155                 20064
60                  7200
70                  9100
None            6449
55                  7642
None            28698

I want to replace the None with 0.

I've tried this

ames.data[ames.data == "None"] <- 0

But this gives me an < NA > wherever there was a none.

How do I replace the None with 0?

praks5432
  • 7,246
  • 32
  • 91
  • 156
  • `ames.data$Lot.Frontage[ames.data$Lot.Frontage == "None"] <- 0` ? If `ames.data` were a matrix, you would be fine, but I guess it's a data.frame. – Frank Mar 14 '14 at 20:50
  • so what I posted above is a subset of the data - I have a lot more columns like that. Doing ames.data[ames.data == "None"] <- 0 has the same issue as in the problem – praks5432 Mar 14 '14 at 20:54
  • 1
    Please provide `dput(ames.data)` or otherwise create reproducible data. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Frank Mar 14 '14 at 20:58
  • 3
    The `` is because `class(ames.data$Lot.Frontage)` is a `factor` – Jake Burkhead Mar 14 '14 at 21:01

4 Answers4

2

Here's how to do it over all columns. I generated some "None"s in Lot.Area to show

sapply(ames.data, class)
## Lot.Frontage     Lot.Area                                                                                                                                                                                                                  
##     "factor"    "integer"                                                                                                                                                                                                                  

ames.data$Lot.Area <- ifelse(runif(nrow(ames.data)) < 0.25, "None", ames.data$Lot.Area)
##    Lot.Frontage Lot.Area                                                                                                                                                                                                                   
## 1            34     3901                                                                                                                                                                                                                   
## 2            70     None                                                                                                                                                                                                                   
## 3            60     None                                                                                                                                                                                                                   
## 4            64     7018                                                                                                                                                                                                                   
## 5           111    16259                                                                                                                                                                                                                   
## 6            50     4280                                                                                                                                                                                                                   
## 7           155     None                                                                                                                                                                                                                   
## 8            60     None                                                                                                                                                                                                                   
## 9            70     9100                                                                                                                                                                                                                   
## 10         None     None                                                                                                                                                                                                                   
## 11           55     7642                                                                                                                                                                                                                   
## 12         None    28698                                                                                                                                                                                                                   

ames.data <- as.data.frame(lapply(ames.data, function(x) {
  x <- as.character(x)
  x[x == "None"] <- 0
  as.numeric(x)
}))
##    Lot.Frontage Lot.Area                                                                                                                                                                                                                   
## 1            34     3901                                                                                                                                                                                                                   
## 2            70        0                                                                                                                                                                                                                   
## 3            60        0                                                                                                                                                                                                                   
## 4            64     7018                                                                                                                                                                                                                   
## 5           111    16259                                                                                                                                                                                                                   
## 6            50     4280                                                                                                                                                                                                                   
## 7           155        0                                                                                                                                                                                                                   
## 8            60        0                                                                                                                                                                                                                   
## 9            70     9100                                                                                                                                                                                                                   
## 10            0        0                                                                                                                                                                                                                   
## 11           55     7642                                                                                                                                                                                                                   
## 12            0    28698                                                                                                                                                                                                                   

sapply(ames.data, class)
## Lot.Frontage     Lot.Area                                                                                                                                                                                                                  
##    "numeric"    "numeric"     
Jake Burkhead
  • 6,435
  • 2
  • 21
  • 32
1

Check class(ames.data$Lot.Frontage). I bet it is a factor. That means you can only replace values by other values in levels(ames.data$Lot.Frontage).

You can do this a couple of ways, but they all boil down to converting the columns to a type you can change. In this case, convert to character first, then change "None" to "0", then convert to numeric.

ames.data$Lot.Frontage <- as.character(ames.data$Lot.Frontage)
ames.data$Lot.Frontage[ames.data$Lot.Frontage == "None"] <- "0"
ames.data$Lot.Frontage <- as.numeric(ames.data$Lot.Frontage)

If you convert directly to numeric, the "None"s will become NAs. Since you may have other missing data, the "None"s and the other missing data will get confused.

Jake Burkhead
  • 6,435
  • 2
  • 21
  • 32
Christopher Louden
  • 7,540
  • 2
  • 26
  • 29
0

If you read the data specifying na.strings="None" and colClasses=c("numeric","numeric") you can replace the "None" with 0 as usual

read.table("file", header=T, quote="\"",colClasses=c("numeric","numeric"),na.strings="None")
df[is.na(df)]<-0
DatamineR
  • 10,428
  • 3
  • 25
  • 45
0

Using dplyr, you can generalize this function across all columns that are of character type. This is particularly useful when working with a time series, where you have date column.

library(dplyr)

ames.data <- ames.data %>% 
        mutate(across(where(is.character), ~na_if(., "None"))) %>% 
        mutate(across(where(is.character), as.double)) 
date col1 col2
{date} {chr} {chr}
2000-01-01 1 2
2000-01-02 4 None
2000-01-03 None 6
2000-01-04 3 0

will become:

date col1 col2
{date} {dbl} {dbl}
2000-01-01 1 2
2000-01-02 4 NA
2000-01-03 NA 6
2000-01-04 3 0
hdn911
  • 1