0

[1]In my dataset I have responses to a yes/no question, with a lot of missing values.

The column for the question looks something like this:

Question
[1] yes
[2] no
[3] 
[4] yes
[5] no
[6] 

In other words:

summary(Question)

173

yes

160

no

155

where we have 173 missing values, 160 yes answers, and 155 no's.

When I look at levels in the factor, I get the following:

levels(Question)
[1] " "
[2] yes
[3] no

I would like to drop the missing values (that is, level " ") (and have legitimate reasons to exclude missing values in this case).

However, is.na(Question) reports (implausibly) that there are no missing values, so I cannot easily exclude them.

I have tried dropping the level with missing values:

droplevels.factor(Question, exclude={" "}

but it results in a "NAs introduced by coercion" warning message.

What can I do to exclude the level with missing values? Please help. Thank you.

Edited with link to data file.

KaC
  • 287
  • 1
  • 5
  • 19
  • 2
    `is.na` only looks for the magic NA value (which is different than a level value). Your "missing" values seem to be a string with a single space in it. So maybe you want `Question[Question != " "]` possibly followed by `droplevels()` but it's not totally clear what you are trying to do or if you have any actual NA values. When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Mar 05 '18 at 21:59
  • Try `Question[Question==" "] = NA` to convert them to `NA` – R. Schifini Mar 05 '18 at 22:01
  • Thank you@MrFlick and @R. Schifini. You'd think that either option would work, but they don't seem to do anything. E.g. Question[Question==" "] = NA doesn't actually convert the blank (" ") values to NA. Baffling. I've edited the question with a link to an example (I hope that's how it's done). – KaC Mar 05 '18 at 22:23
  • In case anyone stumbles on this in the future, the solution has turned out to be the following: - Converting " " to NA didn't work, so I downloaded a copy of the dataset with missing values marked as -99. - Then Question[Question=="-99"] = NA worked. - That correctly marked the missing values as NA, but the level in the factor renamed. - I dropped it using Question <- factor(Question) – KaC Mar 06 '18 at 16:44

1 Answers1

0

you can use scan

  scan(text=Questions,what="character",quiet=TRUE)
Onyambu
  • 67,392
  • 3
  • 24
  • 53