0

Hello I am trying to create a new variable in my data set, that combines each dummy of "education" with their respective character strings so I can use the factor of edu in a regression model.

enter image description here

I am not certain how to create a new variable "edu" with "edu4"in the first & second row and so on... Help is much appreciated!

  • 1
    `max.col(df[startsWith(names(df),"edu")])` - essentially duplicating https://stackoverflow.com/questions/17735859/for-each-row-return-the-column-name-of-the-largest-value/17735894 though I think. – thelatemail Oct 28 '20 at 08:17

2 Answers2

3

As you not provide the dataset by dput function I built a small example by myself.

dput(df)
structure(list(id = 1:10, edu1 = c(1, 0, 0, 0, 0, 0, 0, 0, 1,
0), edu2 = c(0, 0, 0, 0, 0, 1, 0, 1, 0, 0), edu3 = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0), edu4 = c(0, 1, 1, 0, 1, 0, 0, 0, 0, 0),
    edu5 = c(0, 0, 0, 1, 0, 0, 1, 0, 0, 1)), class = "data.frame", row.names = c(NA,
-10L))

Solution

df$edu = factor(apply(df[,paste0("edu", 1:5)], 1, which.max))

Result

> df
   id edu1 edu2 edu3 edu4 edu5 edu
1   1    1    0    0    0    0   1
2   2    0    0    0    1    0   4
3   3    0    0    0    1    0   4
4   4    0    0    0    0    1   5
5   5    0    0    0    1    0   4
6   6    0    1    0    0    0   2
7   7    0    0    0    0    1   5
8   8    0    1    0    0    0   2
9   9    1    0    0    0    0   1
10 10    0    0    0    0    1   5
polkas
  • 3,797
  • 1
  • 12
  • 25
0

Try this: df is your data frame, and your edu variables are in colum 7 to 12. But we start from 8. If all your edu variables are 0 edu1 will be generated.

factor_variable <- factor((df[ ,8:12] %*% (1:ncol(df[ ,8:12]))) + 1, 
             labels = c("edu1", colnames(df[ ,8:12])))

Let me know if this worked.

Elias
  • 726
  • 8
  • 20