add a column based on the value in a certain column in dataframe

Question

I am trying to manpulate my dataframe, however, I searched for the answer for a while without a solution. If this question duplicated, I sincerely apologize to this.

I have a dataframe(df) like this:

import pandas as pd
data = {'A': ['Emo/3', 'Emo/4', 'Emo/1', '', 'Neu/5', 'Neu/2'],
        'Height': [5.1, 6.2, 5.1,'', 5.2, 5.2],
        }
df = pd.DataFrame(data)
df

    A       Height
0   Emo/3   5.1
1   Emo/4   6.2
2   Emo/1   5.1
3       
4   Neu/5   5.2
5   Neu/2   5.2

I want add another column "B", so that the value of the "B" is based on column"A". If a row of column A contains a certain string the same row in column B will be a number: eg. if "Emo/" in column A, then column B is 0. If the row in column is empty, then colmn B in the same row is also empty. The output should looks like this:

    A      Height   B
0   Emo/3   5.1     0
1   Emo/4   6.2     0
2   Emo/1   5.1     0
3           
4   Neu/5   5.2     1
5   Neu/2   5.2     1

Currently, I have the code below, but it gives me an error message: "TypeError: argument of type 'float' is not iterable"

df["B"]=""
for index, row in df.iterrows:
  if "Emo/" in row["A"]:
    row["B"]=0
  elif "Neu/" in row['A']:
    row['B']=1
  elif row['A']=="":
    row['B']=""

Any suggestions helps! Thanks!

Could this be a version issue? I'm using pandas 1.3.1 and if I use `df.iterrows()` instead of `df.iterrows`, this works for me. — zlipp, Sep 24 '21 at 16:57

score 0 · Accepted Answer · answered Sep 24 '21 at 17:26

As zlipp commented, parentheses are missing in the method call:

for index, row in df.iterrows():
#                            ^^

However, note that iterrows should be avoided whenever possible.

I suggest using np.select. Build the conditions with str.contains (or str.startswith) and set default='':

conditions = [
    df['A'].str.contains('Emo/'),
    df['A'].str.contains('Neu/'),
]
choices = [
    0,
    1,
]

df['B'] = np.select(conditions, choices, default='')
#        A Height  B
# 0  Emo/3    5.1  0
# 1  Emo/4    6.2  0
# 2  Emo/1    5.1  0
# 3                 
# 4  Neu/5    5.2  1
# 5  Neu/2    5.2  1

Thank u so very to point that out. This methods works. Though I found out that I need an additional argument for the pandas dataframe, probably due to my python version (3.8). ```conditions = [ df['A'].str.contains('Emo/'), na=False df['A'].str.contains('Neu/'), na=False ]``` — Catherine, Sep 25 '21 at 00:08

add a column based on the value in a certain column in dataframe

1 Answers1