2

I have a character vector that looks like:

"Internet" "Internet" "-1"       "-5"       "Internet" "Internet" 

I want to replace all values that would be negative numeric values (-1, -5, etc) with NA.

I did that with this code:

hintsData$WhereSeekHealthInfo[hintsData$WhereSeekHealthInfo < 0] <- NA

That seemed to work:

head(hintsData$WhereSeekHealthInfo)
# [1] "Internet" "Internet" NA         NA         "Internet" "Internet"

But then when I did

> sum(hintsData$WhereSeekHealthInfo == "Internet")
# [1] NA

Basically I couldn't sum the values anymore because I changed the vector in some way?

Prior to running the NA code I was able to run the code and get this:

> sum(hintsData$WhereSeekHealthInfo == "Internet")
# [1] 1691

So, how can I replace the "-1", "-5" etc values with NA, but still get:

> sum(hintsData$WhereSeekHealthInfo == "Internet")
# [1] 1691

Please let me know if you have an idea. I did find other questions about replacing with NA but as I don't know why I can't count values anymore once I replace with NA I'm not sure what to search on or rule out.

Arun
  • 116,683
  • 26
  • 284
  • 387
soporific
  • 103
  • 1
  • 8
  • 2
    use `na.rm = TRUE` inside `sum`. – Arun Mar 25 '13 at 22:42
  • possible duplicate of: http://stackoverflow.com/questions/7706876/r-script-removing-na-values-from-a-vector (or) this: http://stackoverflow.com/questions/15617876/why-some-functions-do-not-ignore-null-values-in-r – Arun Mar 25 '13 at 22:43
  • Just another Question? Are the quotation marks important? Then you should replace the numbers with "NA" or is that just a typo? – Darokthar Mar 25 '13 at 22:45
  • Thanks to you both, I just wasn't thinking of it the right way. It works now. :) – soporific Mar 25 '13 at 23:02

2 Answers2

5

sum has a na.rm argument, set that to TRUE, and you will remove the NA. (in general, 1+NA = NA, so you want to remove the NA values)

That being said, you are being slightly sneaky with your <0 condition given that your vector is character (it does work in this case, but I wouldn't want to presume it was robust)

The idiomatic approach to setting NA values in R is to use is.na<-, eg

is.na(hintsData$WhereSeekHealthInfo) <- hintsData$WhereSeekHealthInfo <0

Depending on how you read in your data, you could set up this to process your information

Eg, if you knew the valid responses prior to reading in a text file, you could create your own class

 setAs("character","Q1", function(from) factor(from ,levels = c('Internet','Newspaper'))

 read.csv('mytextfile.csv', colClasses = list(WhereSeekHealthInfo = 'Q1')

or perhaps (being more explicit about NA values and less explicit about what valid values are.

  setAs("character","Q1b", function(from) {is.na(from) <- suppressWarnings(as.numeric(from)) <0;from})
mnel
  • 113,303
  • 27
  • 265
  • 254
0

The reason for this, is that x == NA returns NA for any value of x (even if x is itself NA).

So you should use Arun's suggestion, sum(..., na.rm=TRUE)

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112