0

I have been tasked to tidy up some data and am having issues with trying to transform the data from this format:

id occupation_busdriver   occupation_cashier   occupation_nurse
1   0                       0                    1
2   0                       1                    0
3   1                       0                    0

my actual dataset is significantly larger, but this is the area in which I am struggling, and therefore an example for this set would be much appreciated.

I have already tried using the gather and select functions

I am looking to have the data in this format:

id  occupation
1   nurse
2   cashier
3   busdriver
MrFlick
  • 195,160
  • 17
  • 277
  • 295
oliverg99
  • 21
  • 4
  • Related: https://stackoverflow.com/questions/29455255/collapse-mulitple-columns-into-one-column-and-generate-an-index-variable – MrFlick Apr 09 '19 at 16:22
  • 1
    SO is not a code writing service, please try something yourself first and let us know how it goes. Provide any code you've written in trying to accomplish this task in your post. – cet51 Apr 09 '19 at 16:33

1 Answers1

1

We can use max.col to get the column index of the max value per row and based on the index, get the column names

data.frame(df1[1], occupation = sub(".*_", "", names(df1))[-1][max.col(df1[-1])])
#    id occupation
#1  1      nurse
#2  2    cashier
#3  3  busdriver
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Sorry, I am very new to r and still do not quite understand, how to apply the above line to my own. Currently my column headers are id, year, gender, occupation_busdriver, occupation_cashier, occupation_nurse, income (in that order), thank you for your help and any further clarification would be much appreciated – oliverg99 Apr 09 '19 at 16:35
  • @oliverg99 if you are interested in 'occupation' column, create an index of xolumns `nm1 <- grep('occupation", names(df1))` Instead of `[-1]` use `[nm1]` – akrun Apr 09 '19 at 16:37
  • After trying to implement this, when I run the programme, the console now shows a '+' and now nothing will run – oliverg99 Apr 09 '19 at 16:49
  • @oliverg99 Did you meant `nm1 <- grep('occupation", names(df1))` Probably, you missed a closing bracket or so – akrun Apr 09 '19 at 16:50
  • @oliverg99 sorry, there was a typo between `'` and `"` Should be `nm1 <- grep('occupation', names(df1))` and then do `data.frame(df1[1], occupation = sub(".*_", "", names(df1))[nm1][max.col(df1[nm1])])` – akrun Apr 09 '19 at 16:53
  • @oliverg99 It is working for me now – akrun Apr 09 '19 at 16:55
  • 1
    Working for me too, thank you so much! – oliverg99 Apr 09 '19 at 16:56