6

I have 3 categorical variables

agegroup{<20,20-30,>03}    
disease.level{0,1,2},  
performance{<60, >=60}

and I would like to combine them into one dummy variable with 3x3x2 levels. Is there any fast way to do this? My original datasets has about 10 variables with multiple levels in each.

Basically I am asking for the exact opposite of this question Create new dummy variable columns from categorical variable

Thanks a lot EC

Community
  • 1
  • 1
ECII
  • 10,297
  • 18
  • 80
  • 121

1 Answers1

6

I'm not sure whether by "dummy variable" you want 0/1 indicator variables (in which you would have 18 dummy variables) or whether you want a single factor with 18 levels. Sounds like the latter. (Actually, paste would work as well as interaction, although interaction is a bit more self-describing.)

> ff <- expand.grid(agegroup=factor(c("<20","20-30",">30")),
       disease.level=factor(0:2),performance=factor(c("<60",">=60")))
> combfac <- with(ff,interaction(agegroup,disease.level,performance))
> combfac
 [1] <20.0.<60    20-30.0.<60  >30.0.<60    <20.1.<60    20-30.1.<60 
 [6] >30.1.<60    <20.2.<60    20-30.2.<60  >30.2.<60    <20.0.>=60  
[11] 20-30.0.>=60 >30.0.>=60   <20.1.>=60   20-30.1.>=60 >30.1.>=60  
[16] <20.2.>=60   20-30.2.>=60 >30.2.>=60  
18 Levels: <20.0.<60 20-30.0.<60 >30.0.<60 <20.1.<60 20-30.1.<60 ... >30.2.>=60

If you want to use all the variables in the data frame to create the interaction you can use do.call(interaction,ff).

If you did want the dummy variables you would do model.matrix(~combfac-1) to get them.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 2
    Relative to `paste`, `interaction` can also be nice in that it produces levels for all possible combinations of the two factors, even those that don't appear in the present data. – Josh O'Brien Dec 07 '11 at 19:38