1

I'm aware of this solution, but I am having difficulty applying it with data that isn't only dummy variables.

Some sample code to load, essentially from a series of expenses

df <- data.frame(Charge = c(12,4,6,10,5,9), Groceries = c(1,0,0,0,0,0),Utilities = c(0,1,0,0,0,0),Consumables = c(0,0,1,0,0,0), Transportation = c(0,0,0,1,0,0),Entertainment = c(0,0,0,0,1,0),Misc = c(0,0,0,0,0,1))

I would like to create a new variable "Category" that takes the column names that are currently coded as binaries. I am able to do this with ifelse, but I am looking for a more general solution, e.g. out of the reshape package.

Currently, I can only solve this with:

df$Category <- ifelse(df$Groceries==1, "Groceries",      
                      ifelse(df$Utilities==1,"Utilities",
                             ifelse(df$Consumables==1,"Consumables",
                                    ifelse(df$Transportation==1,"Transportation",
                                           ifelse(df$Entertainment==1,"Entertainment","Misc")))))
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Aaron
  • 109
  • 5

1 Answers1

0

If there is always a 1 and it is not repeated in a single row, then use max.col to return the index of the max value in the row and with that index, subset the names of the dataset

df$Category <- names(df)[-1][max.col(df[-1])]
df$Category
#[1] "Groceries"      "Utilities"      "Consumables"    "Transportation" "Entertainment"  "Misc"  
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks, this works. One follow up, say the df has additional rows (e.g. dates) after the dummy variables, how would I index the columns I want to use? Rather than counting a specific range, such as [2:6]? – Aaron Nov 15 '18 at 17:02
  • @Aaron Yes, you can subset based on the numeric or with column names. In this case, it would be 2:6 – akrun Nov 15 '18 at 17:07