I'm working in python, and have a dataframe(EAD), which includes the column 'CLIENT'. There is also another column called NPV, which is made up of numbers.
CLIENT NPV
0 1 50
1 1 90
2 2 130
3 2 -11
4 2 57
5 3 105
...
...
My goal is to group by CLIENT and then sum the NPV's of a client together.
My expected result would be like so:
CLIENT NPV
0 1 140
1 2 176
2 3 105
...
I did this by using a groupby function, like so:
EAD_CLIENT = EAD.groupby('COUNTERPARTYNAME')['NPV'].sum().reset_index()
However, when I did so, I got strange output NPV's that weren't correct math.
Output:
CLIENT NPV
0 1 140
1 2 27
2 3 -150
...
Now this is an example, and a tiny portion of my data, but it should illustrate my point. It's clear that for some reason the NPV's are simply not summing correctly. I next tried this:
EAD['SUM(NPV)'] = EAD.groupby('COUNTERPARTYNAME')['NPV'].transform('sum')
But that gave me the same result. I thought perhaps 'NPV' was still a string, so I tried several ways of turning it into an integer.
I added a line above EAD['SUM(NPV)'] that said:
EAD['NPV']=int(EAD['NPV']
but it claimed that TypeError: cannot convert the series to
I also tried:
EAD['NPV']=EAD['NPV'].astype(int).
That didnt work either however, and I'm really struggling to understand what I'm doing wrong.
TLDR; I can't figure out how to get Python to do the math right.