1

Question: How do you group a df based on a variable, make a computation using a for loop?

The task is to make a conditional computation based on the value in a column. But the computational constants are dependent upon the value in the reference column. Given this df:

In [55]: df = pd.DataFrame({
    ...:     'col1' : ['A', 'A', 'B', np.nan, 'D', 'C'],
    ...:     'col2' : [2, 1, 9, 8, 7, 4],
    ...:     'col3': [0, 1, 9, 4, 2, 3],
    ...: })

In [56]: df
Out[56]: 
  col1  col2  col3
0    A     2     0
1    A     1     1
2    B     9     9
3  NaN     8     4
4    D     7     2
5    C     4     3

I've used the solution here to insert a 'math' column that takes the balance from col3 and adds 10. But now I want to iterate over a list to set the computational variable dependent upon the values in col1. Here's the result:

In [57]: items = ['A', 'D']

In [58]: for item in items:
    ...:     df.loc[:, 'math'] = df.loc[df['col1'] == item, 'col3']
    ...:     

In [59]: df
Out[59]: 
  col1  col2  col3  math
0    A     2     0   NaN
1    A     1     1   NaN
2    B     9     9   NaN
3  NaN     8     4   NaN
4    D     7     2   2.0
5    C     4     3   NaN

The obvious issue is that the df is over written on each iteration. The math column for index 0 and 1 computed values on the first iteration, but they are removed on the second iteration. The resulting df only considers the last element of the list.

I could go through and add coding to iterate through each index value - but that seems more pathetic than pythonic.

Expected Output for the .mul() example

In [100]: df
Out[100]: 
  col1  col2  col3  math
0    A     2     0   0.0
1    A     1     1  10.0
2    B     9     9   NaN
3  NaN     8     4   NaN
4    D     7     2  20.0
5    C     4     3   NaN
cs95
  • 379,657
  • 97
  • 704
  • 746
Bill Armstrong
  • 1,615
  • 3
  • 23
  • 47

2 Answers2

3

The problem with your current method is the output of each subsequent iteration overwrites the output of the one before it. So you'd end up with output for just the last item and nothing more.

Select all rows with elements in items and assign, same as you did before.

df['math'] = df.loc[df.col1.isin(items), 'col3'] * 10

Or,

df['math'] = df.query("col1 in @items").col3 * 10

Or even,

df['math'] = df.col3.where(df.col1.isin(items)) * 10

df

  col1  col2  col3  math
0    A     2     0   0.0
1    A     1     1  10.0
2    B     9     9   NaN
3  NaN     8     4   NaN
4    D     7     2  20.0
5    C     4     3   NaN
cs95
  • 379,657
  • 97
  • 704
  • 746
2

The reason why you fail with assign , cause in each for loop you are assign a Math with new value , like below which will only show the last one and present to the result after the for loop

0     0.0
1    10.0
2     NaN
3     NaN
4     NaN
5     NaN
Name: col3, dtype: float64
0     NaN
1     NaN
2     NaN
3     NaN
4    20.0
5     NaN
Name: col3, dtype: float64

You can do it with below

df.loc[df.col1.isin(items),'math']=df.col3*10
df
Out[85]: 
  col1  col2  col3  math
0    A     2     0   0.0
1    A     1     1  10.0
2    B     9     9   NaN
3  NaN     8     4   NaN
4    D     7     2  20.0
5    C     4     3   NaN
BENY
  • 317,841
  • 20
  • 164
  • 234