1

I borrowed a little example from here

df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2), y = 1:6)
library(caret)
dummy <- dummyVars(~ ., data = df, fullRank = TRUE, sep = "_")
head(predict(dummy, df))

##    letter_b letter_c y
##  1        0        0 1
##  2        0        0 2
##  3        1        0 3
##  4        1        0 4
##  5        0        1 5
##  6        0        1 6

However, it gives a dataframe where the first dummy of the factor variable letter_a is removed.

I also have tried the fastDummies::dummy_cols as follows:

head(fastDummies::dummy_cols(df, remove_selected_columns=TRUE, remove_first_dummy=TRUE))

    ##     y letter_b letter_c
##  1  1        0        0
##  2  2        0        0
##  3  3        1        0
##  4  4        1        0
##  5  5        0        1
##  6  6        0        1

but it only has a remove_first_dummy=TRUE argument with also removing letter_a. How can one remove the last dummy of the factor variable letter_c in R in a concise and convenient way?

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
John Stone
  • 635
  • 4
  • 13
  • 1
    Why would you? The one chosen only has cosmetic effect. The estimated parameters will be in the same subspace. – Oliver Jun 06 '21 at 07:47
  • @Oliver It is set according to a real data example in a journal paper, for easy explanation. Of course, I can deal with variables one by one since the number of variables is not too many. Tks anyway! – John Stone Jun 06 '21 at 08:03
  • 1
    Fair enough. Commonly you would re-level your factors. Eg `factor(factor_var, levels = c(...))` the latter vector specifying the order. The first in "levels" will be your baseline in most implementations. – Oliver Jun 06 '21 at 09:48

1 Answers1

1

You can use relevel to set the reference to be the last dummy (in this case c):

library(caret)
df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2), y = 1:6)
df$letter <- relevel(factor(df$letter),ref = "c")
dummy <- dummyVars(~ ., data = df, fullRank = TRUE, sep = "_")
head(predict(dummy,df))

  letter_a letter_b y
1        1        0 1
2        1        0 2
3        0        1 3
4        0        1 4
5        0        0 5
6        0        0 6
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thank you for your answer! I am just wondering that how I can achieve my goal if I have a data.frame with several factor columns in your way (think about a 20-column data.frame with 7 factor columns). Do I need to set something like `df$letter <- relevel(factor(df$letter),ref = "c")` for each factor column? – John Stone Jun 06 '21 at 14:59
  • 1
    you can do `for(i in c("letter",,{})){df[[i]] = relevel(factor(df[[i]],ref=..)}` – StupidWolf Jun 06 '21 at 16:43
  • 1
    I don't know if it is always `c`.. but most likely you can find some rule to get the level you want – StupidWolf Jun 06 '21 at 16:43