1

I have a vector data of factors whose levels are (0) No, (1) Yes and (8) Residue.

Here is the value of the second element of that vector, data[2]: (1) Yes

What I don't understand is that the value of data[2] == "Yes" is FALSE. Also surprising is that the value of as.integer(data[2)] is 2. Shouldn't it be 1? And shouldn't the value of data[2] == "Yes" be TRUE.

I just started to use R, so I still don't know much about it, but I really don't understand this. Can someone please explain to me what's going on?

Philippe
  • 65
  • 6
  • 3
    Please provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) (such as a `dput()` of your data) so we can see what's really there. Not sure how you have both number 0/1/8 and No/Yes/Residue. Factors in R don't really work like that. Each level gets an sequential integer value starting at 1. – MrFlick Jun 30 '17 at 20:31
  • Although this is an R language question, you might want to ask it on Cross-Validated (or even migrate it there), after rewriting with a reproducible example, and giving additional context. This is additionally justifiable given the comments on the one answer, where OP Philippe mentioned an unconventional aspect of the dataset, (i.e. atypical labeling of the factors in the vector) as potentially causing the anomalous behavior. If the unconventional aspect of the dataset can be tied into something non-standard in a statistical sense, then cross-posting or question migration seems reasonable. – Ellie Kesselman Oct 25 '17 at 18:59
  • I solved this on my own, it's just a bad question because I was confused about factors. – Philippe Oct 25 '17 at 21:00

1 Answers1

2

I'm not really sure where 0, 1, and 8 are coming from but consider this reproducible example:

Load data

 dt <- factor(c("No", "Yes", "Residue"), levels = c("No", "Yes", "Residue"))

Check second value

This returns the character value of the second element of dt but we know its a factor because the factor levels are printed.

 dt[2]

[1] Yes

Levels: No Yes Residue

Evaluate second value

dt[2] == "Yes"

[1] TRUE

This returns 2 because Yes is the second factor level.

as.integer(dt[2])

[1] 2

Behind the scenes, factors are not represented by characters but rather by their integer factor level. Even though the result of print(dt[2]) looks like a character, R evaluates it by its factor level.

jsta
  • 3,216
  • 25
  • 35
  • 1
    `dt[2] == "Yes"` is also TRUE. You shouldn't need `as.character()`. That's not necessary. – MrFlick Jun 30 '17 at 20:43
  • The output when I just type data[2] is this: – Philippe Jun 30 '17 at 20:46
  • [1] (1) Yes Levels: (0) No (1) Yes (8) Residue – Philippe Jun 30 '17 at 20:48
  • 1
    I know understand why as.integer(data[2]) has the value 2. But I still don't understand why data[2] == "Yes" doesn't have the value TRUE. I thought that, as MrFlick says, it wasn't necessary to use as.character to convert it to a string first. In fact, when I did that with another dataset, it worked as expected. – Philippe Jun 30 '17 at 20:49
  • Note that as.character(data[2]) also has the value FALSE. This is because as.character(data[2]) has the value "(1) Yes". – Philippe Jun 30 '17 at 20:51
  • But violent.crimes$V4246C[2] == "(1) Yes" has the value TRUE, as expected. Is it just that my dataset coded the labels of the factors in that vector in a weird way, but setting the label to "(1) Yes" for 1 instead of just "Yes"? – Philippe Jun 30 '17 at 20:52
  • It is unfortunate we didn't get to read an answer from @MrFlick here too! – Ellie Kesselman Oct 25 '17 at 18:46