I have a DataFrame that looks like:
import pandas as pd
df = pd.DataFrame([[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
[9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0],
[17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0]],
columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])
A B C D E F G H
0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
1 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0
2 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0
And I have a list of columns:
l = ['A', 'C', 'D', 'E']
For each element of my list, I want to get the mean of the dataframe columns that precede it plus twice the value in its own column. So, A
will only depend on itself, C
will depend on A
and itself, D
will depend on the sum of A
, C
, and itself, and E
will depend on A
, C
, D
, and itself. I have accomplished what I need in the following way:
for i, col in enumerate(l):
other_cols = l[:i]
df['tmp_' + col] = df[other_cols].mean(axis=1) + 2.0 * df[col]
A B C D E F G H tmp_A tmp_C tmp_D \
0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 NaN 7.0 10.0
1 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0 NaN 31.0 34.0
2 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0 NaN 55.0 58.0
tmp_E
0 12.666667
1 36.666667
2 60.666667
I was wondering if there was an even more Pythonic way to accomplish the same thing rather than having to run through the for loop?