2

I have a csv with two columns: employee id 'eid' and manager's employee id 'mid'. Trying to get python code that will, for each employee, add columns showing employee id's of the manager all the way to CEO. CEO has a an employee id of 1. Ultimately I want to write the result back to a csv.

So the data looks like:

eid,    mid
111,    112
113,    112
112,    114
114,    115
115,    1

I am expecting output that looks like this. Note that while no employee will have more than 4 levels of managers, but i would like to also learn python that names columns dynamically.

eid,    mid,    l2mid   l3mid   l4mid
111,    112,    114,    115,    1
113,    112,    114,    115,    1
112,    114,    115,    1   
114,    115,    1       
115,    1           

I am very new to coding, and trying to teach myself but keep getting stuck. My questions: 1) I was trying to use a for statement that took mid in a given row, then found that that manager's manager, and so on, until i reached the CEO. I have been trying along these lines:

df = pd.read_csv('employee.csv') 
if mid =! 1 
for i in df:
    df.['l2mid'] = df.loc[df.eid == [i], [mid]]

Maybe I'm approaching this backwards and I should try grouping all employees by manager? How would that code be different?

I have seen solutions in C# and sql, and i've seen solutions that build trees and json. I really appreciate any help and encouragement.

Update: next step was to add a country column - see: entry here

QuillPy
  • 25
  • 1
  • 5

1 Answers1

4

I believe there is a better solution, but this works. I filled empty with zeros.

a = []
for index, row in df.iterrows():
    res = df[df['eid']==row['mid']]['mid'].values
    a.append(0 if not res else res[0])
df['l2mid'] = a

a = []
for index, row in df.iterrows():
    res = df[df['eid']==row['l2mid']]['mid'].values
    a.append(0 if not res else res[0])
df['l3mid'] = a

a = []
for index, row in df.iterrows():
    res = df[df['eid']==row['l3mid']]['mid'].values
    a.append(0 if not res else res[0])
df['l4mid'] = a

df
# output :
# eid   mid l2mid   l3mid   l4mid
# 0 111 112 114 115 1
# 1 113 112 114 115 1
# 2 112 114 115 1   0
# 3 114 115 1   0   0
# 4 115 1   0   0   0

You can define a function for routines.

def search_manager(target_column, new_column):
    a = []
    for index, row in df.iterrows():
        res = df[df['eid']==row[target_column]]['mid'].values
        a.append(0 if not res else res[0])
    df[new_column] = a

search_manager('mid', 'l2mid')
search_manager('l2mid', 'l3mid')
search_manager('l3mid', 'l4mid')
Wonjin
  • 432
  • 3
  • 11
  • Both of these approaches work - thank you! I need to study exactly why they work, but this is really helpful for me to make progress - both in my task and in my learning. – QuillPy Aug 29 '17 at 04:51
  • Can you explain this answer line by line. Thank you much!! – AhmFM Mar 30 '22 at 16:56