0

So I'm trying to work on a few dataframes that are similar to this:

employee ID salary regioncode writeups jobcode fired
1 10000 11 1 23F 0
2 20000 20 2 12D 0
3 20000 10 0 45T 1
... ... ... ... ... ...
1000 30000 5 0 603 1

This original dataframe is further divided into 2 other dfs: one containing only employees with writeups, and the other containing employees only with either a specific jobcode, or who got fired.

And i want to form tables on them based on aggregrate counts of demographics, (in this case regioncode), for example:

regioncode total employees employees fired employees written up
10 200 35 20
20 150 27 15

Below is some code resembling what I'm doing to achieve this, with some explanation to the code

  • regioncode_table is the table I want to return like the above one
  • employees_all is the master df that resembles the first table I posted
  • employees_fired is the divided df that only contains employees with a certain jobcode or who got fired
  • employees_writeups is the divided df containing only employees with writeups
regioncode_table = []
# appending regioncodes rownames to the table
regioncode_table.append(employees_all.regioncode.value_counts().sort_index().index.tolist())


#adding total employees from a region
regioncode_table.append(employees_all.regioncode.value_counts(dropna=False).sort_index().tolist())

#adding total employees who got fired or have a certain jobcode
regioncode_table.append(employees_fired.regioncode.value_counts(dropna=False).sort_index().tolist())

#adding total employees who were written up at least once
regioncode_table.append(employees_writeups.regioncode.value_counts(dropna=False).sort_index().tolist())

This on the surface gives me a table consistent to the one I'm looking for, but as soon as it encounters a region with no employees getting writeups or fired, the values get shifted up by 1. I would like to figure out a way for the the table to display 0 values from a specific region as well, such as if regioncode 42 has 3 employees and none of them have been written up or fired, the regioncode table still accounts for them and puts 0s in the table, such as:

regioncode total employees employees fired employees written up
42 3 0 0
23 15 0 1

So far in addition to the method I have already tried to implement, I have tried using the groupby method for dataframes but I cannot figure a way to account for the zeroes as I have mentioned. If anyone has any other solutions please do let me know. Sorry for any formatting errors, or coding errors, I am still pretty much a beginner and this is my first post on stackoverflow. Thanks in advance!

  • `df.groupby('regioncode').agg({'employee ID': 'nunique', 'fired': 'sum', 'writeups': 'sum'})` – mozway Oct 27 '22 at 05:04

0 Answers0