0

I have a column in my dataset that is a of factor values. Here is an example dataset:

a <- c(1,4,6,3,8)
b <- c("No","Yes","NA", "Maybe", "Yes")
df <- data.frame(a,b)

I'd like to change the NA in column 2 ("b") to "Sometimes". I have tried two different approaches that, in my mind, should work but don't seem to give me the correct output:

df[is.na(df$b)] <- "Sometimes"
df[df$b == "NA"] <- "Sometimes"

Is there a way to do this?

user3585829
  • 945
  • 11
  • 24
  • 5
    `"NA"` isn't a real `NA`, it should be just `NA` – David Arenburg Feb 01 '16 at 19:12
  • 1
    `df$b[df$b == "NA"] <- "Sometimes"` – jogo Feb 01 '16 at 19:17
  • 1
    In addition, to what David said, see the answers on this question: [how to change name of factor levels](http://stackoverflow.com/questions/29711067/r-how-to-change-name-of-factor-levels) – Jota Feb 01 '16 at 19:20
  • 2
    Because of the default setting `stringsAsFactors=TRUE`, the suggestion by @jogo works only after setting `df$b <- as.character(df$b)`. And yes, it is not a "real" NA, those are just the two letters; hence it won't be recognized by `is.na()`. – RHertel Feb 01 '16 at 19:23
  • Sorry, the NA in my dataset comes out as . Does that change things? It doesn't seem like a regular NA. – user3585829 Feb 01 '16 at 19:35
  • How do I account for the then? I converted the column to a factor but jogo's approach still didn't work. – user3585829 Feb 01 '16 at 19:40
  • 1
    The form `` is typical for missing levels. That one is "real", as can be checked with `which(is.na(df$b))`. – RHertel Feb 01 '16 at 19:40
  • But using df$b[df$b == ""] <- "Sometimes" wont work in my dataset? – user3585829 Feb 01 '16 at 19:43
  • 1
    You need to introduce a new level if you don't want to convert the entries into characters. You could try `levels(df$b) <- c(levels(df$b),"Sometimes"); df$b[is.na(df$b)]<-"Sometimes"`... As I have just noticed, this has been mentioned in the second part of the answer by @fishtank. – RHertel Feb 01 '16 at 19:44

1 Answers1

1

If you are using "NA" (i.e. character string), then the following will change it to a different factor "Sometimes"

> df <- data.frame(a,b)
> levels(df$b)
[1] "Maybe" "NA"    "No"    "Yes"
> levels(df$b)[levels(df$b)=="NA"]
[1] "NA"
> levels(df$b)[levels(df$b)=="NA"]<-"Sometimes"
> df$b
[1] No        Yes       Sometimes Maybe     Yes  
Levels: Maybe Sometimes No Yes
> df
  a         b
1 1        No
2 4       Yes
3 6 Sometimes
4 3     Maybe
5 8       Yes

Otherwise, if you had use NA, this will add Sometimes as a new factor and use is.na to change it:

b <- c("No","Yes",NA, "Maybe", "Yes")
df<-data.frame(a,b)
levels(df$b)<-c(levels(df$b),"Sometimes") # introduce a new factor
df[is.na(df$b),"b"]<-"Sometimes"

> df
  a         b
1 1        No
2 4       Yes
3 6 Sometimes
4 3     Maybe
5 8       Yes
fishtank
  • 3,718
  • 1
  • 14
  • 16