0

I have a df where the class of a column is factor. However, 80% of its values are numeric/integer/float and the 20% are string "No contact". How can I replace "No contact" with the value 0?.

I understand if I change the class of the column to numeric, it will give me an error (How to convert a data frame column to numeric type?). So, I need to replace the value "No contact" first in order to change the column class.

Thanks !

Sotos
  • 51,121
  • 6
  • 32
  • 66
Chris
  • 2,019
  • 5
  • 22
  • 67
  • The answer you link to is sloppy, you don't actually get an Error (which halts execution of the code), you get a Warning and the non-convertible values are replaced with `NA`. This is nice - you can ignore the warning and then replace the NA values with 0 in the result. – Gregor Thomas Apr 18 '18 at 13:02
  • Remember to use `as.numeric(as.character())` when converting a `factor` variable with numeric levels. – LAP Apr 18 '18 at 13:04
  • 1
    @Gregor In this particular case, your suggestion would work, as the OP knows that there is only one string level in the column. But it wouldn't work with two or more distinct string values. – Tim Biegeleisen Apr 18 '18 at 13:08
  • It also wouldn't work if there were additional `NA` to begin with which should not be changed to `0`. – LAP Apr 18 '18 at 13:10

2 Answers2

1

Try replacing that particular level:

levels(df$col)[levels(df$col) == "No contact"] <- "0"

Then, if the remainder of the data in this column be numeric as you expect, you may convert it to numeric:

df$col <- as.numeric(levels(df$col))[df$col]
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
-1

This would work

df$col[df$col=="No contact"] <- 0

You can then change it to as.numeric

Rana Usman
  • 1,031
  • 7
  • 21
  • The `which()` is unnecessary. – LAP Apr 18 '18 at 13:02
  • @LAP why do you think that? – Rana Usman Apr 18 '18 at 13:05
  • 1
    Because `df$col == "No contact"` returns a logical vector of the same length as `df$col` that works just fine for subsetting the data. The `which()` just returns the row positions that evaluate as `TRUE` from the logical vector. Try it out yourself. `df$col[df$col == "No contact"]` has the same result as `df$col[which(df$col=="No contact")]`. – LAP Apr 18 '18 at 13:07
  • @LAP just to be sure, you mean this? `df$group[df$group=="A"] <- 0` ? – Rana Usman Apr 18 '18 at 13:11
  • 1
    Yep, exactly. The `which()` is an unnecessary additional operation when it comes to indexing/subsetting from a logical expression. – LAP Apr 18 '18 at 13:12
  • Despite all this, your solution will probably not work, as it will generate `NA` whenever `0` is not already a valid factor level of the variable. – LAP Apr 18 '18 at 13:20
  • @LAP I think you are mistaken here. – Rana Usman Apr 18 '18 at 13:26
  • See for yourself... `test <- factor(1:5, 1:5, labels = c("1", "2", "3", "4", "No contact"))`, `test[test == "No contact"] <- 0`, `Warning message: In `[<-.factor`(`*tmp*`, test == "No contact", value = 0) : invalid factor level, NA generated`. – LAP Apr 18 '18 at 13:29
  • @LAP is right: It gives me a NA. It doesn't matter if I try with "0" or 0. Why it doesn't receive "0"? – Chris Apr 19 '18 at 00:59
  • @Chris It does not change to `"0"` because the factor levels do not contain `"0"` beforehand. You need to change the factor levels, not the value. – LAP Apr 19 '18 at 07:14