I have the following (toy) dataset, which looks similar to the production data:
import pandas as pd
import numpy as np
df = pd.DataFrame({'System_Key':['MER-002', 'MER-003', 'MER-004', 'MER-005', 'BAV-378', 'BAV-379', 'BAV-380', 'BAV-381', 'AUD-220', 'AUD-221', 'AUD-222', 'AUD-223'],
'Manufacturer':['Mercedes', 'Mercedes', 'Mercedes', 'Mercedes', 'BMW', 'BMW', 'BMW', 'BMW', 'Audi', 'Audi', 'Audi', 'Audi'],
'Region':['Americas', 'Europe', 'Americas', 'Asia', 'Asia', 'Europe', 'Europe', 'Europe', 'Americas', 'Asia', 'Americas', 'Americas'],
'Department':[np.nan, 'Sales', np.nan, 'Operations', np.nan, np.nan, 'Accounting', np.nan, 'Finance', 'Finance', 'Finance', np.nan],
'Approver':[np.nan, 'Jones, T.', np.nan, 'Jones, T.', 'Jones, T.', 'Smith, W.', np.nan, np.nan, 'Jones T.', 'Williams, S.', 'Williams, S.', np.nan]
})
System_Key Manufacturer Region Department Approver
0 MER-002 Mercedes Americas NaN NaN
1 MER-003 Mercedes Europe Sales Jones, T.
2 MER-004 Mercedes Americas NaN NaN
3 MER-005 Mercedes Asia Operations Jones, T.
4 BAV-378 BMW Asia NaN Jones, T.
5 BAV-379 BMW Europe NaN Smith, W.
6 BAV-380 BMW Europe Accounting NaN
7 BAV-381 BMW Europe NaN NaN
8 AUD-220 Audi Americas Finance Jones T.
9 AUD-221 Audi Asia Finance Williams, S.
10 AUD-222 Audi Americas Finance Williams, S.
11 AUD-223 Audi Americas NaN NaN
I then create a custom column:
df['Rebate_Plan'] = np.where(df['System_Key'].str.contains('BAV', na=False), 'Jupyter',
np.where(df['System_Key'].str.contains('AUD', na=False), 'Uranus',
np.where((df['System_Key'].str.contains('MER', na=False)) & (df['Approver'].str.contains('Jones', na=False)), 'Saturn',
np.where((df['System_Key'].str.contains('MER')) & (~df['Approver'].str.contains('Jones')), 'Pluto', '*No Plan*'))))
But, I get the following error message:
TypeError: bad operand type for unary ~: 'float'
Note: the error message I get on the production data is:
TypeError: ufunc 'invert' not supported for input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
I believe the invert
function only handles integer and boolean types. So, I convert the data using:
df = df.all.applymap(str)
(which is discussed in Python - Turn all items in a Dataframe to strings)
Doing so resolves that error message and results in the following:
System_Key Manufacturer Region Department Approver Rebate_Plan
0 MER-002 Mercedes Americas nan nan Pluto
1 MER-003 Mercedes Europe Sales Jones, T. Saturn
2 MER-004 Mercedes Americas nan Smith, W. Pluto
3 MER-005 Mercedes Asia Operations Jones, T. Saturn
4 MER-006 Mercedes Asia nan Jones, T. Saturn
5 MER-007 Mercedes Europe nan nan Pluto
6 BAV-378 BMW Europe Accounting nan Jupyter
7 BAV-379 BMW Asia Finance nan Jupyter
8 BAV-380 BMW Europe Finance nan Jupyter
9 BAV-381 BMW Europe nan nan Jupyter
10 AUD-220 Audi Americas Finance nan Uranus
11 AUD-221 Audi Asia Finance nan Uranus
12 AUD-222 Audi Americas Finance nan Uranus
13 AUD-223 Audi Americas nan nan Uranus
Lastly, I create the pivot table:
pivot_data = pd.pivot_table(data=df, index='Manufacturer', columns='Approver', values='System_Key', aggfunc='count', margins=True, margins_name='TOTALS', fill_value=0)
pivot_data
Which results in:
Approver Jones, T. Smith, W. nan TOTALS
Manufacturer
Audi 0 0 4 4
BMW 0 0 4 4
Mercedes 3 1 2 6
TOTALS 3 1 10 14
But, I would like to remove the "Audi" and "BMW" lines from the pivot_table (since they're blank and contain NaN values), such that the finished product looks as follows:
Approver Jones, T. Smith, W. TOTALS
Manufacturer
Mercedes 3 1 4
TOTALS 3 1 4
And, I can do this simply with the following:
pivot_data = pd.pivot_table(data=df, index='Manufacturer', columns='Approver', values='System_Key', aggfunc='count', margins=True, margins_name='TOTALS', fill_value=0)
BUT, THIS ONLY WORKS IF CALLED BEFORE ADDING THE DERIVED 'REBATE_PLAN' COLUMN and calling df = df.all.applymap(str)
.
Does anyone know of a way to work around this?
Thanks in advance!