2

I have a table that for some variables has missing data (recorded as NULL) - I'd like to convert some of these missing cells to hold a 0 but for some reason I can't seem to get the syntax correct. My initial approach was to do this:

b<- eval(parse(text=paste(table_full$','column_name1',sep='')))
b[which(is.na(b))]<-0
b[which(b=='NULL')]<-0

and then save the data to a file, however - this still results in missing data in the output files and warning messages like:

In `[<-.factor`(`*tmp*`, which(is.na(b)), value = 0) :
  invalid factor level, NA generated

Alternatively, I've tried things of the form:

b[which(is.na(as.numeric(as.character(b))))]<-0

but this didn't resolve the situation.

I'm relatively new to R and can't understand exactly what I'm doing wrong here. Thanks in advance!

anthr
  • 1,026
  • 4
  • 17
  • 34
  • The `which` is redundant here, it's sufficient to be `b[is.na(b)] <- 0`. And I'd recommend to run script with `options(stringsAsFactors = F)`. – m0nhawk Jun 16 '15 at 18:28
  • Trying both `b[is.na(b)]<-0` and `b[is.null(b)]<-0` still results in the same error, unfortunately (even with the `stringsAsFactors` set to False). – anthr Jun 16 '15 at 18:33
  • And how do you save to file? – m0nhawk Jun 16 '15 at 18:34
  • Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). The use of `eval()` here seems very unnecessary. And by assigning to `b`, you will not be actually updating the data.frame itself. If you have literal "NULL" values in your text file, use `na.strings=` on the `read.table` to turn those into NA values (you can't have "true" NULL values in a vector so the word "NULL" is coerced to a character/string value). – MrFlick Jun 16 '15 at 18:43

2 Answers2

3

Since R tends not to store its values as "NULL", I'm going to go out on a limb and assume you imported it as text, more specifically as factors. Try reimporting w stringsAsFactors = FALSE and then use your code:

b[b=='NULL'] <- 0

A more elegant way would be to use the na.strings=c("NULL") when you read the data in.

Serban Tanasa
  • 3,592
  • 2
  • 23
  • 45
  • Of course you wont actually be replacing with the number zero here. If the column is character, the number 0 will be converted to a string containing "0". You will still not be able to perform arithmetic operations on the column. It would be the same as `b[b=='NULL'] <- "0"` – MrFlick Jun 16 '15 at 18:58
0

is.na() returns TRUE or FALSE. Try b[which(is.na(b) == T)]<-0 instead

JakeC
  • 292
  • 3
  • 11
  • `which()` will return the TRUE values so that's really not necessary. You can also subset with boolean values do `b[is.na(b)] <-0` would would work just as well. – MrFlick Jun 16 '15 at 18:56