0

I have a DataFrame like this:

import pandas as pd
df = pd.DataFrame(columns=list('ABC'))
df[A] = [22, 43, 64, 86]

And, I want to populate the other two columns using comparison operators. Here is what I have:

if df['A'] <= 25:
   df['B'] = 'k'
   df['C'] = 'k'
elif df['A'] > 25 & df['A'] <= 50:
   df['B'] = 'b'
   df['C'] = 'none'
elif df['A'] > 50
   df['B'] = 'g'
   df['C'] = 'r'

But, I'm having trouble with using the operators on a DataFrame. I get an error like "ValueError: The truth value of a Series is ambiguous." Does anyone know a workaround?

Edit: I'd like to stick with using elif due to the potential of very large DataFrames in the future. I'm trying to avoid searching through the DataFrame every time I use a new comparison operator.

Cody Smith
  • 123
  • 7
  • 1
    use parentheses here: `elif (df['A'] > 25) & (df['A'] <= 50):` – pault Jul 25 '19 at 18:34
  • Your current code makes the entire column a single value. is that what you want? For learning purposes you may want to look at [this post](https://stackoverflow.com/questions/17729853/replace-value-for-a-selected-cell-in-pandas-dataframe-without-using-index) [`where`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.where.html) or [`np.where`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html) – MattR Jul 25 '19 at 18:38
  • I want the columns to differ based on the corresponding row in the first column @MattR – Cody Smith Jul 25 '19 at 18:42
  • Possible duplicate of [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) – MattR Jul 25 '19 at 18:45

1 Answers1

2

Method 1

Perfect use case for pd.cut:

df['B'] = pd.cut(df['A'], [0,25,50,np.inf], labels=['k', 'b', 'g'])
df['C'] = pd.cut(df['A'], [0,25,50,np.inf], labels=['k', 'None', 'g'])

Output

    A  B     C
0  22  k     k
1  43  b  None
2  64  g     g
3  86  g     g

Method 2

Since we have 1 set of conditions, this is also a good use case for np.select:

conditions = [
    df['A'] <= 25,
    (df['A'] > 25) & (df['A'] <= 50),
    df['A'] > 50
]

choices1 = ['k', 'b', 'g']
choices2 = ['k', 'None', 'g']

df['B'] = np.select(conditions, choices1, default='unknown')
df['C'] = np.select(conditions, choices2, default='unknown')

Output

    A  B     C
0  22  k     k
1  43  b  None
2  64  g     g
3  86  g     g
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • this is a fantastic answer and I think would be worth posting on some other higher-viewed "potential duplicate" questions. – MattR Jul 25 '19 at 18:57
  • maybe something like [this one](https://stackoverflow.com/questions/28896769/vectorize-conditional-assignment-in-pandas-dataframe)? this question has been titled in various ways so hard to pinpoint them – MattR Jul 25 '19 at 19:03
  • I appreciate the responses. But my main concern (now edited into original post) is that very large DataFrames may take a long time if the script has to search the DataFrame completely through every time new comparison operators are used. That's why I was using `elif` – Cody Smith Jul 25 '19 at 19:10
  • 1
    It is the opposite. The solutions I provided are `vectorized` solutions and are many times faster than loops. [Heres](https://stackoverflow.com/questions/54028199/are-for-loops-in-pandas-really-bad-when-should-i-care) and [here](https://stackoverflow.com/questions/35091979/why-is-vectorization-faster-in-general-than-loops) are good reads on `for loops` vs `vectorization` @CodySmith – Erfan Jul 25 '19 at 19:17
  • Ah, okay. Thanks so much. Still trying to learn! @Erfan Do you suggest a particular method out of the two you posted? – Cody Smith Jul 25 '19 at 19:19
  • In general, try to always avoid for / if else loops with `pandas`. Especially on large dataframes it is terribly slow @CodySmith – Erfan Jul 25 '19 at 19:20
  • Thanks for the tip, added an answer there @MattR – Erfan Jul 25 '19 at 19:24
  • 1
    Both should be up there in speed, but this is a really good usecase for `pd.cut`. If you get more and more conditions, try `np.select`. @CodySmith – Erfan Jul 25 '19 at 19:25