A few days ago, I asked this question Creating a new column based on the mean of other values in group
The answer was great. However, the issue is we have so many 0.000001 values etc. and these converge to inf or -inf values. The data is financial data and these little numbers mean a lot to us. So, instead of converging to infinity, I still want to take sum and calculate the mean.
I found this such that, using nsum module in the math package, is doable. however, the transform function does not have any properties to work with nsum. As a side note, if anyone wants to refresh the memory, here is some debate on how -inf vs inf values occur:
SO the question is, how to take the sum of infinite values with a transform? or how to change the below code?
A small mock data:
d = {'col1': ["A", "A", "A", "B", "B", "B", "c", "c","c", "d", "d", "d", "e", "e", "e"],
'col2': [2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017],
'col3': [1E307, -1E305, -1E30700080, -3E30700080, 0.3E30700080, 0.00005, -0.005291, 8, -9, 10, 50, 60, -70, 40, 60]],
'group':[10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 20, 20,20]}
df = pd.DataFrame(d)
The solution with sum:
df['means'] = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)