Gathering multiple dummy variables as one categorical variable in R

Question

I'm aware of this solution, but I am having difficulty applying it with data that isn't only dummy variables.

Some sample code to load, essentially from a series of expenses

df <- data.frame(Charge = c(12,4,6,10,5,9), Groceries = c(1,0,0,0,0,0),Utilities = c(0,1,0,0,0,0),Consumables = c(0,0,1,0,0,0), Transportation = c(0,0,0,1,0,0),Entertainment = c(0,0,0,0,1,0),Misc = c(0,0,0,0,0,1))

I would like to create a new variable "Category" that takes the column names that are currently coded as binaries. I am able to do this with ifelse, but I am looking for a more general solution, e.g. out of the reshape package.

Currently, I can only solve this with:

df$Category <- ifelse(df$Groceries==1, "Groceries",      
                      ifelse(df$Utilities==1,"Utilities",
                             ifelse(df$Consumables==1,"Consumables",
                                    ifelse(df$Transportation==1,"Transportation",
                                           ifelse(df$Entertainment==1,"Entertainment","Misc")))))

score 0 · Accepted Answer · answered Nov 15 '18 at 16:49

0

If there is always a 1 and it is not repeated in a single row, then use max.col to return the index of the max value in the row and with that index, subset the names of the dataset

df$Category <- names(df)[-1][max.col(df[-1])]
df$Category
#[1] "Groceries"      "Utilities"      "Consumables"    "Transportation" "Entertainment"  "Misc"

answered Nov 15 '18 at 16:49

akrun

874,273
37
540
662

Thanks, this works. One follow up, say the df has additional rows (e.g. dates) after the dummy variables, how would I index the columns I want to use? Rather than counting a specific range, such as [2:6]? – Aaron Nov 15 '18 at 17:02
@Aaron Yes, you can subset based on the numeric or with column names. In this case, it would be 2:6 – akrun Nov 15 '18 at 17:07

Gathering multiple dummy variables as one categorical variable in R

1 Answers1