138

I need to set the value of one column based on the value of another in a Pandas dataframe. This is the logic:

if df['c1'] == 'Value':
    df['c2'] = 10
else:
    df['c2'] = df['c3']

I am unable to get this to do what I want, which is to simply create a column with new values (or change the value of an existing column: either one works for me).

If I try to run the code above or if I write it as a function and use the apply method, I get the following:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

wjandrea
  • 28,235
  • 9
  • 60
  • 81
NLR
  • 1,714
  • 2
  • 11
  • 21

10 Answers10

207

one way to do this would be to use indexing with .loc.

Example

In the absence of an example dataframe, I'll make one up here:

import numpy as np
import pandas as pd

df = pd.DataFrame({'c1': list('abcdefg')})
df.loc[5, 'c1'] = 'Value'

>>> df
      c1
0      a
1      b
2      c
3      d
4      e
5  Value
6      g

Assuming you wanted to create a new column c2, equivalent to c1 except where c1 is Value, in which case, you would like to assign it to 10:

First, you could create a new column c2, and set it to equivalent as c1, using one of the following two lines (they essentially do the same thing):

df = df.assign(c2 = df['c1'])
# OR:
df['c2'] = df['c1']

Then, find all the indices where c1 is equal to 'Value' using .loc, and assign your desired value in c2 at those indices:

df.loc[df['c1'] == 'Value', 'c2'] = 10

And you end up with this:

>>> df
      c1  c2
0      a   a
1      b   b
2      c   c
3      d   d
4      e   e
5  Value  10
6      g   g

If, as you suggested in your question, you would perhaps sometimes just want to replace the values in the column you already have, rather than create a new column, then just skip the column creation, and do the following:

df['c1'].loc[df['c1'] == 'Value'] = 10
# or:
df.loc[df['c1'] == 'Value', 'c1'] = 10

Giving you:

>>> df
      c1
0      a
1      b
2      c
3      d
4      e
5     10
6      g
sacuL
  • 49,704
  • 8
  • 81
  • 106
  • 11
    The second solution nailed it for me. I didn't realize you could use .loc like a WHERE statement in SQL. Makes sense. Thanks! – NLR Mar 07 '18 at 22:12
  • How would you apply 10 to multiple columns instead on just one? eg: df.loc[df['c1'] == 'Value', 'c2','c3','c4] = 10. – Steve May 24 '19 at 16:12
  • 1
    I think you have to place all the columns you need to update the value with in a list, then loop through that list and changing the column name parameter in it? – Joe Aug 13 '19 at 10:48
  • Great Solution, I am facing a similar problem. Additonally to the condition I need to select only the first 5 entries and set the value only on them. kunden_df.loc[kunden_df["Distanz"] == 1].iloc[0:amount_contracts]["Betreuer"] = name can you help me with the syntax? – MLAlex Aug 25 '20 at 09:28
  • 1
    When I assign values with statements like `df.loc[df['c1'] == 'Value', 'c1'] = 10` I get the dreaded `SettingWithCopyWarning` -- even when I use the `.loc[]` syntax. So far it hasn't been a problem for me, but I find it odd that I get the warning, despite using the suggested `.loc[]` approach. Any fixes for that? This is with pandas 1.2.3, numpy 1.18.5, and python 3.7.10. – Darren Mar 31 '21 at 17:14
  • @Darren, in my experience, the most common reason for a `SettingWithCopyWarning` is trying to assign values on a view. is it possible that the `df` you're trying to modify is e.g. a subset of a larger dataframe? Using the method above on such a view would raise the warning. Also worth mentioning it can be hard to predict which actions will result in a view in pandas (see [this](https://www.practicaldatascience.org/html/views_and_copies_in_pandas.html)) – sacuL Mar 31 '21 at 17:28
  • @sacuL, thanks. The scenarios where I get this and can't figure out why look exactly like the example you described -- not when I'm looking at view of a larger DF.I can effectively prevent the `SettingWithCopyWarning` when the row subset logic in `.loc[]` is on the right-hand side of the operation, but not when it's on the left for assignment of values (like in your example). Luckily it has never caused a practical problem for me in anything I've worked on, but I still can't figure out why I can't assign values like the example you gave without the `SettingWithCopyWarning`. – Darren Mar 31 '21 at 23:00
  • 1
    Let's say I have an int column, and I want to divide its value by 1000 if its value is more 1000. Using the the first option `df['c1'].loc[df['c1'] > 1000] = df['c1'].loc[df['c1'] > 1000]/1000` I got the `SettingWithCopyWarning`. However, with the 2nd option `df['c1'].loc[df['c1'] > 1000, 'c1']` I don't get that warning. – Averell Jun 24 '21 at 00:14
80

You can use np.where() to set values based on a specified condition:

#df
   c1  c2  c3
0   4   2   1
1   8   7   9
2   1   5   8
3   3   3   5
4   3   6   8

Now change values (or set) in column ['c2'] based on your condition.

df['c2'] = np.where(df.c1 == 8,'X', df.c3)

   c1  c2  c3
0   4   1   1
1   8   X   9
2   1   8   8
3   3   5   5
4   3   8   8
DJK
  • 8,924
  • 4
  • 24
  • 40
  • what if I want to keep all original columns – mLstudent33 Mar 13 '20 at 01:37
  • 2
    @mLstudent33, using `df['newColName'] = ...`, you use a column name, not in your dataframe to create a new column, assuming `newColName` does not already exist – DJK Jul 16 '20 at 20:37
  • 1
    Superior to all other solutions, as (a) more readable due to less ambiguity of order of terms and (b) it's more future proof as it's easier to modify it to take multiple columns into account and (c) it's fast, no interpreted code in a lambda. – Contango Jul 15 '21 at 08:17
  • 1
    @DJK I guess `c2` in your code refers to `c4` in the output table? You should better correct one of them. – abu Aug 28 '21 at 08:03
40

try:

df['c2'] = df['c1'].apply(lambda x: 10 if x == 'Value' else x)
wjandrea
  • 28,235
  • 9
  • 60
  • 81
aggis
  • 608
  • 4
  • 9
  • Thanks @AlexanderHughes. My original post had a typo: there are actually three columns to consider, so this solution wouldn't work. – NLR Mar 07 '18 at 22:06
  • 7
    should be `df.apply(lambda x: 10 if x['c1'] == 'Value' else x['c3'],axis=1)` – DJK Mar 07 '18 at 22:32
  • 9
    This might have performance issues with large datasets. df.apply() is slower. – ErnestScribbler Nov 01 '18 at 10:21
  • Was looking for the same, found a lambda worked for me with a dataframe. My code is 'ard['Hr'] = ard.apply(lambda x: x['Hr']+1 if x['Mi'] >= 45 and x['Mi'] < 60 else x['Hr'],axis=1)' – Runawaygeek May 13 '20 at 13:55
23

Note the tilda that reverses the selection. It uses pandas methods (i.e. is faster than if/else).

df.loc[(df['c1'] == 'Value'), 'c2'] = 10
df.loc[~(df['c1'] == 'Value'), 'c2'] = df['c3']
double-beep
  • 5,031
  • 17
  • 33
  • 41
vkerov
  • 231
  • 2
  • 4
7

I suggest doing it in two steps:

# set fixed value to 'c2' where the condition is met
df.loc[df['c1'] == 'Value', 'c2'] = 10

# copy value from 'c3' to 'c2' where the condition is NOT met
df.loc[df['c1'] != 'Value', 'c2'] = df[df['c1'] != 'Value', 'c3']
Ralf
  • 16,086
  • 4
  • 44
  • 68
6

You can use pandas.DataFrame.mask to add virtually as many conditions as you need:

data = {'a': [1,2,3,4,5], 'b': [6,8,9,10,11]}

d = pd.DataFrame.from_dict(data, orient='columns')
c = {'c1': (2, 'Value1'), 'c2': (3, 'Value2'), 'c3': (5, d['b'])}

d['new'] = np.nan
for value in c.values():
    d['new'].mask(d['a'] == value[0], value[1], inplace=True)

d['new'] = d['new'].fillna('Else')
d

Output:

    a   b   new
0   1   6   Else
1   2   8   Value1
2   3   9   Value2
3   4   10  Else
4   5   11  11
nimbous
  • 1,507
  • 9
  • 12
4

I believe Series.map() to be very readable and efficient, e.g.:

df["c2"] = df["c1"].map(lambda x: 10 if x == 'Value' else x)

I like it because if the conditional logic gets more complex you can move it to a function and just pass in that function instead of the lambda.

If you need to base your conditional logic on more than one column you can use DataFrame.apply() as others suggest.

Pierre Monico
  • 982
  • 8
  • 16
1

Try out df.apply() if you've a small/medium dataframe,

df['c2'] = df.apply(lambda x: 10 if x['c1'] == 'Value' else x['c1'], axis = 1)

Else, follow the slicing techniques mentioned in the above comments if you've got a big dataframe.

Rahul Bordoloi
  • 145
  • 3
  • 9
0

Many of the answers provided are great! Just wanted to add 1 extra thing. Building off of the most upvoted answer which uses .loc, if you have a list of values you want to check for you can just modify the line:

df['c1'].loc[df['c1'] == 'Value'] = 10

to

df['c1'].loc[df['c1'].isin(['Value1','Value2','Value3'])] = 10 

Note, as explained in the most upvoted answer, this replaces values in the column.

-1

I had a big dataset and .loc[] was taking too long so I found a vectorized way to do it. Recall that you can set a column to a logical operator, so this works:

file['Flag'] = (file['Claim_Amount'] > 0)

This gives a Boolean, which I wanted, but you can multiply it by, say, 1 to make an Integer.

Mark
  • 1,079
  • 1
  • 9
  • 8