1

I have a df where column A is either blank or has a string in it. I tried to write the if statement (all columns are strings) below. Basically, if there is something (any value) in df[A], then the new column value will be a concatenation of columns A, B and C. If there is no value in df[A], then it will concatenate columns B and C.

the part where it's idf df[A] returns a true or false value, right? just like if I were to write bool(df[A]). So if the value is true, then it should execute the first block, if not, then it should execute the 'else' block.

if df[A]:
     df[new_column] = df[column_A] + df[column_B] + df[column_C]
else: 
     df[new_column] = df[column_B]+df[column_C]

I get this error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

tminn
  • 83
  • 6
  • Does this answer your question? https://stackoverflow.com/questions/53830081/python-pandas-the-truth-value-of-a-series-is-ambiguous – match Dec 29 '21 at 19:42
  • Welcome to Stack Overflow! Have you tried the suggestions listed in the error? – Jasmijn Dec 29 '21 at 19:43
  • `df[A]` is a column, possibly containing many values. Are you trying to check if _any_ of those values are nonblank, or if they are _all_ nonblank? – John Gordon Dec 29 '21 at 19:46

2 Answers2

0

this happens because df['A'] returns a object which is Series and a object with some information can never be false like [0,0,0] or [None] so it will always return true if it is object. And pandas series doesn't allow you to compare it as a boolean as it's ambiguous

so try this:

if df[A].any():
     df[new_column] = df[column_A] + df[column_B] + df[column_C]
else: 
     df[new_column] = df[column_B]+df[column_C]

what this code does is it returns true if there is any value present in whole column. You can use df[A].all() if you need all elements in column to be true.

gilf0yle
  • 1,092
  • 3
  • 9
  • so, this worked if column A was not empty. the first condition was true, so it concatenated column A, B and C. but for the rows where column A was empty, it returned a Nan – tminn Dec 29 '21 at 20:37
  • .any() works if any one item is true in column. .all() return true if all elements are true in that particular column. consider upvoting if it helped. – gilf0yle Dec 29 '21 at 22:29
0

As far as I understand your question, you want to perform the IF-condition for each element. The "+" seems to be a string concatenation, since there are strings in df['A'].

In this case, you don't need the IF-condition at all, because adding an empty string to another leads to the same result as not adding the string.

import pandas as pd

d = {'A': ['Mr ', '', 'Mrs '], 'B': ['Max ', 'John ', 'Marie '], 'C': ['Power', 'Doe', 'Curie']}
df = pd.DataFrame(data=d)

df['new'] = df['A'] + df['B'] + df['C']

Results in:

>>> df
      A       B      C              new
0   Mr     Max   Power     Mr Max Power
1         John     Doe         John Doe
2  Mrs   Marie   Curie  Mrs Marie Curie

In the case that "blank" refers to NaN and not to an empty string you can do the following:

df['new'] = df.apply(lambda x: ''.join(x.dropna().astype(str)), axis=1)

Have a look at this question, which seems to be similar: questions 33098383

  • this only works if there is something in column A. In your example, the new column would have resulted in Mr Max Power, then a Mrs Marie Curie. row 1 comes up as Nan – tminn Dec 29 '21 at 20:33
  • 1
    @tminn That's right, I have updated the answer for this case. – Horst Weber Dec 29 '21 at 20:55