Conditional Logic on Pandas DataFrame

Question

How to apply conditional logic to a Pandas DataFrame.

See DataFrame shown below,

   data desired_output
0     1          False
1     2          False
2     3           True
3     4           True

My original data is show in the 'data' column and the desired_output is shown next to it. If the number in 'data' is below 2.5, the desired_output is False.

I could apply a loop and do re-construct the DataFrame... but that would be 'un-pythonic'

maybe I don't know pandas, but it seems that you have *two* numbers in `data` -- which one are you checking against (seemingly the one on the right? What relevance is the number on the left?) — mgilson, Feb 05 '13 at 18:26
the number on the left is the index and the one on the right is the data — nitin, Feb 05 '13 at 18:31
Does this answer your question? [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) — AMC, Jan 25 '20 at 19:14

score 75 · Answer 1 · answered Feb 05 '13 at 18:35

75

In [1]: df
Out[1]:
   data
0     1
1     2
2     3
3     4

You want to apply a function that conditionally returns a value based on the selected dataframe column.

In [2]: df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
Out[2]:
0     true
1     true
2    false
3    false
Name: data

You can then assign that returned column to a new column in your dataframe:

In [3]: df['desired_output'] = df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')

In [4]: df
Out[4]:
   data desired_output
0     1           true
1     2           true
2     3          false
3     4          false

answered Feb 05 '13 at 18:35

Zelazny7

39,946
18
70
84

Although this answer is more verbose and not as simple as the answer @Jasc gave, it is more general and can be applied to other situations in which one wants output other than true and false. – Jacques Mathieu Jun 20 '18 at 16:49
5

`apply` + `lambda` is not recommended for easily vectorisable operations. Use `np.where` or `loc` methods instead to utilize Pandas / NumPy vectorisation. – jpp Aug 10 '18 at 13:12

Jan Katins · Accepted Answer · 2013-09-30T15:45:39.833

31

Just compare the column with that value:

In [9]: df = pandas.DataFrame([1,2,3,4], columns=["data"])

In [10]: df
Out[10]: 
   data
0     1
1     2
2     3
3     4

In [11]: df["desired"] = df["data"] > 2.5
In [11]: df
Out[12]: 
   data desired
0     1   False
1     2   False
2     3    True
3     4    True

edited Sep 30 '13 at 15:45

answered Feb 05 '13 at 21:34

Jan Katins

2,219
1
25
35

score 17 · Answer 3 · answered Mar 17 '17 at 02:47

17

In [34]: import pandas as pd

In [35]: import numpy as np

In [36]:  df = pd.DataFrame([1,2,3,4], columns=["data"])

In [37]: df
Out[37]: 
   data
0     1
1     2
2     3
3     4

In [38]: df["desired_output"] = np.where(df["data"] <2.5, "False", "True")

In [39]: df
Out[39]: 
   data desired_output
0     1          False
1     2          False
2     3           True
3     4           True

answered Mar 17 '17 at 02:47

Surya

11,002
4
57
39

1

This is good, but the < seems unnecessarily confusing. If the condition is true, the first value results, if false the second value results. So it seems far more clear (and equivalent) to have the right side = np.where(df["data"] >= 2.5, "True", "False") – Wesley Kitlasten Oct 16 '18 at 14:48

score 14 · Answer 4 · answered Feb 05 '13 at 21:58

14

In this specific example, where the DataFrame is only one column, you can write this elegantly as:

df['desired_output'] = df.le(2.5)

le tests whether elements are less than or equal 2.5, similarly lt for less than, gt and ge.

answered Feb 05 '13 at 21:58

Andy Hayden

359,921
101
625
535

OP wants to return *False* if `df['data'] < 2.5`. So you should use `gt` here. – rachwa Jun 19 '22 at 17:17

score 0 · Answer 5 · answered Jun 19 '22 at 17:10

You can also use eval here:

In [3]: df.eval('desired_output = data >= 2.5', inplace=True)

In [4]: df
Out[4]: 
   data  desired_output
0     1           False
1     2           False
2     3            True
3     4            True

Since inplace=True you don't need to assign it back to df.

Conditional Logic on Pandas DataFrame

5 Answers5

Linked

Related