So I'm trying to work on a few dataframes that are similar to this:
employee ID | salary | regioncode | writeups | jobcode | fired |
---|---|---|---|---|---|
1 | 10000 | 11 | 1 | 23F | 0 |
2 | 20000 | 20 | 2 | 12D | 0 |
3 | 20000 | 10 | 0 | 45T | 1 |
... | ... | ... | ... | ... | ... |
1000 | 30000 | 5 | 0 | 603 | 1 |
This original dataframe is further divided into 2 other dfs: one containing only employees with writeups, and the other containing employees only with either a specific jobcode, or who got fired.
And i want to form tables on them based on aggregrate counts of demographics, (in this case regioncode), for example:
regioncode | total employees | employees fired | employees written up |
---|---|---|---|
10 | 200 | 35 | 20 |
20 | 150 | 27 | 15 |
Below is some code resembling what I'm doing to achieve this, with some explanation to the code
- regioncode_table is the table I want to return like the above one
- employees_all is the master df that resembles the first table I posted
- employees_fired is the divided df that only contains employees with a certain jobcode or who got fired
- employees_writeups is the divided df containing only employees with writeups
regioncode_table = []
# appending regioncodes rownames to the table
regioncode_table.append(employees_all.regioncode.value_counts().sort_index().index.tolist())
#adding total employees from a region
regioncode_table.append(employees_all.regioncode.value_counts(dropna=False).sort_index().tolist())
#adding total employees who got fired or have a certain jobcode
regioncode_table.append(employees_fired.regioncode.value_counts(dropna=False).sort_index().tolist())
#adding total employees who were written up at least once
regioncode_table.append(employees_writeups.regioncode.value_counts(dropna=False).sort_index().tolist())
This on the surface gives me a table consistent to the one I'm looking for, but as soon as it encounters a region with no employees getting writeups or fired, the values get shifted up by 1. I would like to figure out a way for the the table to display 0 values from a specific region as well, such as if regioncode 42 has 3 employees and none of them have been written up or fired, the regioncode table still accounts for them and puts 0s in the table, such as:
regioncode | total employees | employees fired | employees written up |
---|---|---|---|
42 | 3 | 0 | 0 |
23 | 15 | 0 | 1 |
So far in addition to the method I have already tried to implement, I have tried using the groupby method for dataframes but I cannot figure a way to account for the zeroes as I have mentioned. If anyone has any other solutions please do let me know. Sorry for any formatting errors, or coding errors, I am still pretty much a beginner and this is my first post on stackoverflow. Thanks in advance!