I need to make an inplace
value replacement based upon an row index value. The replacement value is a sliced (both row and column) dataframe computation.
Setup
In [1]: import pandas as pd
In [2]: cols = [0, 1, 'A0', 'A1', 'A2', 'A3', 'B0', 'B1', 'B2', 'B3']
In [3]: data = [['sum', 4531.0010, 0, 0, 0, 2, 0, 0, 0, 7],
...: ['', 4531.0010, 5, 6, 3, 0, 5, 4, 7, 0],
...: ['', 4531.0010, 1, 3, 9, 0, 2, 2, 3, 0],
...: ['sum', 5037.0022, 0, 0, 0, 8, 0, 0, 0, 5],
...: ['', 5037.0022, 2, 2, 3, 0, 1, 3, 9, 0],
...: ['', 5037.0022, 5, 4, 7, 0, 5, 6, 3, 0]]
In [4]: df = pd.DataFrame(data=data, columns=cols)
In [5]: df = df.set_index(list(df.columns[[0, 1]]))
In [6]: df
Out[6]:
A0 A1 A2 A3 B0 B1 B2 B3
0 1
sum 4531.0010 0 0 0 2 0 0 0 7
4531.0010 5 6 3 0 5 4 7 0
4531.0010 1 3 9 0 2 2 3 0
sum 5037.0022 0 0 0 8 0 0 0 5
5037.0022 2 2 3 0 1 3 9 0
5037.0022 5 4 7 0 5 6 3 0
As you can see the row is multiindexed with index=1 is a number that represents a subset of the data. Within each data subset there is a "sum" in index=0 which I would like to "distribute" up (or down) to the zero balances.
The computation is basically the "A" columns and the index 1 rows with the same value to be summed to a denominator. Then the sum of the row for that data group is the numerator. The ratio is then used to distribute the sum between the rows.
For the rows=4531.0010 and the columns with an A it would be computed as:
(5+6+3)/(5+6+3+1+3+9)*2 = row 1, column A3
(1+3+9)/(5+6+3+1+3+9)*2 = row 2, column A3
The resulting df
would look like this:
Out[7]:
A0 A1 A2 A3 B0 B1 B2 B3
0 1
sum 4531.0010 0 0 0 2.000 0 0 0 7.000
4531.0010 5 6 3 1.037 5 4 7 4.870
4531.0010 1 3 9 0.923 2 2 3 2.130
sum 5037.0022 0 0 0 8.000 0 0 0 5.000
5037.0022 2 2 3 2.435 1 3 9 2.407
5037.0022 5 4 7 5.565 5 6 3 2.593
The number of rows are not fixed - there could be one or there could be 10.
What I've tried
I've tried using variations of .pivot_table()
but I can't figure out how to reverse the process using division. As an example.
I've also used variations of .sum()
but trying to constrain the df
using slices eludes me. One of many examples.
I think I can get this to work with a lot of python functions, but seems that it should be possible more efficiently. Any direction is greatly appreciated.