Question: How do you group a df
based on a variable, make a computation using a for
loop?
The task is to make a conditional computation based on the value in a column. But the computational constants are dependent upon the value in the reference column. Given this df
:
In [55]: df = pd.DataFrame({
...: 'col1' : ['A', 'A', 'B', np.nan, 'D', 'C'],
...: 'col2' : [2, 1, 9, 8, 7, 4],
...: 'col3': [0, 1, 9, 4, 2, 3],
...: })
In [56]: df
Out[56]:
col1 col2 col3
0 A 2 0
1 A 1 1
2 B 9 9
3 NaN 8 4
4 D 7 2
5 C 4 3
I've used the solution here to insert a 'math'
column that takes the balance from col3
and adds 10. But now I want to iterate over a list to set the computational variable dependent upon the values in col1
. Here's the result:
In [57]: items = ['A', 'D']
In [58]: for item in items:
...: df.loc[:, 'math'] = df.loc[df['col1'] == item, 'col3']
...:
In [59]: df
Out[59]:
col1 col2 col3 math
0 A 2 0 NaN
1 A 1 1 NaN
2 B 9 9 NaN
3 NaN 8 4 NaN
4 D 7 2 2.0
5 C 4 3 NaN
The obvious issue is that the df is over written on each iteration. The math
column for index 0 and 1 computed values on the first iteration, but they are removed on the second iteration. The resulting df
only considers the last element of the list.
I could go through and add coding to iterate through each index value - but that seems more pathetic than pythonic.
Expected Output for the .mul()
example
In [100]: df
Out[100]:
col1 col2 col3 math
0 A 2 0 0.0
1 A 1 1 10.0
2 B 9 9 NaN
3 NaN 8 4 NaN
4 D 7 2 20.0
5 C 4 3 NaN