4

I need to make a mathematical operation which is conditional on the value in a second column. Here is the setup.

Given a simple dataframe (df):

df = pd.DataFrame({
    'col1' : ['A', 'A', 'B', np.nan, 'D', 'C'],
    'col2' : [2, 1, 9, 8, 7, 4],
    'col3': [0, 1, 9, 4, 2, 3],
    })

In [11]: df
Out[11]: 
  col1  col2  col3
0    A     2     0
1    A     1     1
2    B     9     9
3  NaN     8     4
4    D     7     2
5    C     4     3

I can add a new columns (math) and then fill it with a mathematical expression based on the sum of 10 and col3.

df['math'] = 10 + df['col3']

In [14]: df
Out[14]: 
  col1  col2  col3  math
0    A     2     0    10
1    A     1     1    11
2    B     9     9    19
3  NaN     8     4    14
4    D     7     2    12
5    C     4     3    13

but what I can't figure out is how to make the expression conditional on the value in another column (e.g., only if col1 == B). The desired output would be:

In [14]: df
Out[14]: 
  col1  col2  col3  math
0    A     2     0   NaN
1    A     1     1   NaN
2    B     9     9    19
3  NaN     8     4   NaN
4    D     7     2   NaN
5    C     4     3   NaN

For added clarification, I will be using a variable for the col1 value in a for loop. As a result, I couldn't get the .group_by() to work as described here or here. I think I'm looking for something like this...

df['math'] = 10 + df.loc[[df['col1'] == my_var], 'col3']

which I got from the comment in the second example above - but I can't get it to work. It throws a ValueError for too many values - - that is, I'm trying to pass both the filter and the column of operation together but it's only expecting the filter. This SO post also uses the .loc similar to my expression above - but with a static col1.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Bill Armstrong
  • 1,615
  • 3
  • 23
  • 47
  • 1
    Your expression is almost right. You meant `df.loc[:,'math'] = 10 + df.loc[df['col1'] == "B", 'col3']` – rafaelc Jun 26 '18 at 03:35
  • @RafaelC, This is, in fact, what I was trying to do. At least on the right hand side. What's the difference between `df['math']` vs. `df.loc[:, 'math']`? – Bill Armstrong Jun 26 '18 at 04:28

5 Answers5

5

where

I perform the math then mask it using pandas.Series.where by passing the boolean series df.col1.eq('B')

df.assign(math=df.col3.add(10).where(df.col1.eq('B')))

  col1  col2  col3  math
0    A     2     0   NaN
1    A     1     1   NaN
2    B     9     9  19.0
3  NaN     8     4   NaN
4    D     7     2   NaN
5    C     4     3   NaN
piRSquared
  • 285,575
  • 57
  • 475
  • 624
5

Using loc

df['math'] = df.loc[df.col1.eq('B'), 'col3'].add(10)

  col1  col2  col3  math
0    A     2     0   NaN
1    A     1     1   NaN
2    B     9     9  19.0
3  NaN     8     4   NaN
4    D     7     2   NaN
5    C     4     3   NaN
user3483203
  • 50,081
  • 9
  • 65
  • 94
3

Use:(Not a safe way to achieve it see the comment below )

df['New']=df.col3[df.col1=='B']+10
df
Out[11]: 
  col1  col2  col3   New
0    A     2     0   NaN
1    A     1     1   NaN
2    B     9     9  19.0
3  NaN     8     4   NaN
4    D     7     2   NaN
5    C     4     3   NaN

Update

pd.concat([df,(df.col3[df.col1=='B']+10).to_frame('New')],1)
Out[51]: 
  col1  col2  col3   New
0    A     2     0   NaN
1    A     1     1   NaN
2    B     9     9  19.0
3  NaN     8     4   NaN
4    D     7     2   NaN
5    C     4     3   NaN
BENY
  • 317,841
  • 20
  • 164
  • 234
  • This is not a good method of indexing. Use loc or iloc only. Don't mix indexers like this. – cs95 Jun 26 '18 at 04:17
  • @coldspeed yep I know what you mean, how about right now : -) – BENY Jun 26 '18 at 04:20
  • 2
    The problem is the `df.col3[df.col1=='B']` which could've been done with `df.loc[df.col1=='B', 'col3']`. While it doesn't matter here, it may matter somewhere else (producing SettingWithCopyWarnings), and you've complicated this more than you needed to :) – cs95 Jun 26 '18 at 04:21
  • @coldspeed yep , That is correct , when df is sub set of other dfs , we will receive the warning message for sure – BENY Jun 26 '18 at 04:24
0

It was throwing ValueError since you were not using the loc correctly. Here is the solution using loc:

df.loc[:,'math'] = 10 + df.loc[df['col1'] == "B", 'col3']

Output:

 col1 col2 col3 math
0    A   2   0    NaN
1    A   1   1    NaN
2    B   9   9    19.0
3    NaN 8   4    NaN
4    D   7   2    NaN
5    C   4   3    NaN
Bill Armstrong
  • 1,615
  • 3
  • 23
  • 47
min2bro
  • 4,509
  • 5
  • 29
  • 55
0

I was also able to do the following...

df['math'] = 10 + df.loc[df['col1'] == 'B']['col3']  

Which is a variation on @user3483203 answer above. Ultimately, my 'B' is a variable, so I modified for @RafaelC 's comments.

Bill Armstrong
  • 1,615
  • 3
  • 23
  • 47