1

Let's say I have the following DataFrames:

table_a = pandas.DataFrame({ 'employee' : ['a','b','c','d','e','f'], 'department' : ['developer', 'test engineer', 'network engineer', 'manager', 'hr','intern']})

dept_mapping = pandas.DataFrame({'department':['developer','test engineer','network engineer','manager','hr', 'intern'], 'engineer' : [1,1,1,0,0,0], 'management' : [0,0,0,1,1,0], 'intern' : [0,0,0,0,0,1]})

How can I create a new column in table_a which contains corresponding general_department values.That is:

table_a = pd.DataFrame({ 'employee' : ['a','b','c','d','e','f'], 'department' : ['developer', 'test engineer', 'network engineer', 'manager', 'hr','intern'], 'general department' : ['engineer', 'engineer', 'engineer', 'management', 'management' ,'intern'  ]})
Vishal Upadhyay
  • 781
  • 1
  • 5
  • 19
akshayKhanapuri
  • 247
  • 1
  • 2
  • 8
  • 1
    Looks like you'd like to reverse one-hot encoding. This might help - https://stackoverflow.com/a/38334689/9003184 . Let me know if this is what you needed. – a_jelly_fish May 09 '20 at 18:21

1 Answers1

2

You can try idxmax on axis=1 with series.map():

table_a['general department'] = table_a['department'].map(
                              dept_mapping.set_index('department').idxmax(1))
print(table_a)

  employee        department general department
0        a         developer           engineer
1        b     test engineer           engineer
2        c  network engineer           engineer
3        d           manager         management
4        e                hr         management
5        f            intern             intern
anky
  • 74,114
  • 11
  • 41
  • 70
  • Thank you for the answer. It solves my purpose. But i fail to understand the following. We are making use of the map function here. My understanding of the map function is that it takes a function and a list as its arguments and passes the list elements through the function one by one. But in the solution provided, I see that there is no function that we are passing to the map function. So how is it working? Apologies for the newbie question. – akshayKhanapuri May 10 '20 at 07:43
  • 1
    @akshay No problem and Glad that it helped you. We are using [series.map](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html) here which we are using to lookup `table_a['department']` based on another series which is produced by `dept_mapping.set_index('department').idxmax(1)` . You can check the examples in the link and let me know if you still have doubts , I will try my best to explain – anky May 10 '20 at 07:46