1

I'm trying to replace NAs in a single column of a datatable in R with "-999" and I can quite get it.

There is related question here on Stackoverflow but I think this can be done without iterating through the table.

I have a column, column_to_check in a datatable. The column is a factor variable and has 80K observations consisting of NA, 0, and 1. I'm trying to change the NA to -999 so I can do further work.

The code I'm working with is this:

is.na(DT[,column_to_check,with=FALSE]) = "-999"

and

DT[is.na(column_to_check), column_to_check:="-999"]

The first line sets the entire column to NA. The second doesn't work and I know is off but I think I'm close.

Can anyone help?

Thanks.

Community
  • 1
  • 1
Windstorm1981
  • 2,564
  • 7
  • 29
  • 57
  • "I'm trying to change the NA to -999 so I can do further work." -- Unless you are exporting the data to use with some other software, I think you'll regret this. All of R's functionality is designed to play nice with NA and not with ersatz missing value codes. – Frank Sep 24 '16 at 14:52
  • I agree with Frank about the general principle. Though what's the error message with the 2nd line? It looks fine to me. Do you have a reproducible example? – dracodoc Sep 24 '16 at 15:29
  • `data.table` may complain about 2nd line because you are trying to assign a character value to a numeric column. Why would you use `"-999"` instead of `-999`? – dracodoc Sep 24 '16 at 15:31
  • Thanks guys - in fact the column is a factor with "NA", "1", and "0". The second line of code DOES work. Don't know why I had a problem. – Windstorm1981 Sep 24 '16 at 18:18
  • @Frank - I'm doing this for pure munging reasons. The reason I need to change the NAs to -999 (or something other than NA) is that - afterward, I am using if/then/else statements on the columns to find values. I use "if column_to_check == "1" then...: ", "if column_to_check == "0" then...: ", "else....". with NAs in the column I get an error because if/then/else apparently doesn't like NAs. So I need to change it to a character or number – Windstorm1981 Sep 24 '16 at 18:30
  • I am skeptical. I run into NAs and branching conditions all the time and do not find myself wishing my NAs were not NAs. `DT[a == 1, do_stuff]` is the syntax. The `a==1` filters out the rows where a is missing. For what it's worth, I've had some confusion about how to do this before, too: http://stackoverflow.com/questions/16221742/subsetting-a-data-table-using-some-non-na-excludes-na-too and http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently – Frank Sep 24 '16 at 18:40
  • With what I'm doing it literally appears to be an R limitation using if/then/else with datatables. I'm not actually dropping the rows, But R appears to not like me doing if/then/else on the existing values if NA appears in the iteration. If NA is character or numerical (respresenting an NA but not actuallying being and NA , R will do the comparison. – Windstorm1981 Sep 24 '16 at 18:46

1 Answers1

2

Your code isn't off unless the data in the column is not a character in which case you would have to set -999 as inter/numeric without ""

data <- read.table(header=TRUE, text='
 id weight   size
 1     20  small
 2     27  large
 3     24 medium
 ')

data <- data.table(data)

> data[size == 'small', weight := NA]
> data
     size id weight
1:  small  1     NA
2:  large  2     27
3: medium  3     24
> is.na(data)
      size    id weight
[1,] FALSE FALSE   TRUE
[2,] FALSE FALSE  FALSE
[3,] FALSE FALSE  FALSE
> data[is.na(weight), weight := -999]
> data
     size id weight
1:  small  1   -999
2:  large  2     27
3: medium  3     24
> data[size == 'small', weight := NA]
> data[is.na(weight), weight := "-999"]
Warning message:
In `[.data.table`(data, is.na(weight), `:=`(weight, "-999")) :
  Coerced 'character' RHS to 'integer' to match the column's type. 

EDIT: This is, I just saw, what @dracodoc suggested in comment

Daniel Winkler
  • 487
  • 3
  • 11