12

I have a variable, called gender, with binary categorical values "female"/"male". I want to change its type to integers 0/1 so that I can use it in a regression analysis. i.e I want values "female" and "male" to be mapped to 1 and 0.

> str(gender)
gender : Factor w/ 2 levels "female","male":  1 1 1 0 0 0 0 1 1 0 ...
> gender[1]
[1] female

I would like to convert gender variable type so that I get int value 1 when I query an element, i.e.

> gender[1]
[1] 1
Zahra
  • 6,798
  • 9
  • 51
  • 76
user2093989
  • 131
  • 1
  • 1
  • 3

3 Answers3

15

As an addition to @Dason's answer, note that...

test <- c("male","female")

as.factor(test)
#[1] male   female
#Levels: female male

...will return female as the reference group (1) and male as the comparison group (2),

To spin it the other way, you would need to do...

factor(test,levels=c("male","female"))
#[1] male   female
#Levels: male female

As @marius notes, using contrasts will show you how it will work in the regression model:

contrasts(as.factor(test))
#       male
#female    0
#male      1

contrasts(factor(test,levels=c("male","female")))
#       female
#male        0
#female      1
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • 2
    Or, to see even more explicitly how the levels will be treated in a regression model, `contrasts(factor(test))` – Marius Feb 21 '13 at 05:13
14

Convert to a factor and let R take care of the rest. You should never have to take care of explicitly creating dummy variables when using R.

Dason
  • 60,663
  • 9
  • 131
  • 148
  • 6
    +1 far better to address the real issue, not the exact problem! – mnel Feb 21 '13 at 05:05
  • 1
    @Dason, what about if you wanted to include gender in a correlation matrix? This will not work if gender is a factor. – Kevin T May 15 '21 at 17:12
7

If you're doing this for real, you should absolutely follow @Dason's advice. I'm going to assume that you're teaching a class and want to demonstrate indicator variables (with thanks to this question):

dat <- data.frame(gender=sample(c("male", "female"), 10, replace=TRUE))

model.matrix(~gender, data=dat)

   (Intercept) gendermale
1            1          1
2            1          0
3            1          1
4            1          0
5            1          1
6            1          1
7            1          1
8            1          0
9            1          0
10           1          1
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$gender
[1] "contr.treatment"

If you don't want the intercept, use model.matrix(~gender -1 , data=dat) instead.

Community
  • 1
  • 1
sebastian-c
  • 15,057
  • 3
  • 47
  • 93