28

EDIT MADE:

I have the 'Activity' column filled with strings and I want to derive the values in the 'Activity_2' column using an if statement.

So Activity_2 shows the desired result. Essentially I want to call out what type of activity is occurring.

I tried to do this using my code below but it won't run (please see screen shot below for error). Any help is greatly appreciated!

enter image description here

    for i in df2['Activity']:
        if i contains 'email':
            df2['Activity_2'] = 'email'
        elif i contains 'conference'
            df2['Activity_2'] = 'conference'
        elif i contains 'call'
            df2['Activity_2'] = 'call'
        else:
            df2['Activity_2'] = 'task'


Error: if i contains 'email':
                ^
SyntaxError: invalid syntax
PineNuts0
  • 4,740
  • 21
  • 67
  • 112

6 Answers6

38

I assume you are using pandas, then you can use numpy.where, which is a vectorized version of if/else, with the condition constructed by str.contains:

df['Activity_2'] = pd.np.where(df.Activity.str.contains("email"), "email",
                   pd.np.where(df.Activity.str.contains("conference"), "conference",
                   pd.np.where(df.Activity.str.contains("call"), "call", "task")))

df

#   Activity            Activity_2
#0  email personA       email
#1  attend conference   conference
#2  send email          email
#3  call Sam            call
#4  random text         task
#5  random text         task
#6  lwantto call        call
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • @Psidom can you help me with one of my question https://stackoverflow.com/questions/52819833/creating-a-dynamic-twitter-alert-in-python-using-hashing –  Oct 15 '18 at 15:36
  • One does not need to call np from pandas. If you do, you get the following message: " The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly. " Just using np.where() should do the job. It is a good solution suggested by @Psidom. Thank you Psidom! – seakyourpeak Feb 15 '22 at 13:59
14

This also works:

df.loc[df['Activity'].str.contains('email'), 'Activity_2'] = 'email'
df.loc[df['Activity'].str.contains('conference'), 'Activity_2'] = 'conference'
df.loc[df['Activity'].str.contains('call'), 'Activity_2'] = 'call'
moshfiqur
  • 2,065
  • 3
  • 24
  • 27
  • 1
    I realize a couple years old here - but i have thousands of lines like this - how would you implement them efficiently? – Hatt Feb 12 '19 at 20:42
11

The current solution behaves wrongly if your df contains NaN values. In that case I recommend using the following code which worked for me

temp=df.Activity.fillna("0")
df['Activity_2'] = pd.np.where(temp.str.contains("0"),"None",
                   pd.np.where(temp.str.contains("email"), "email",
                   pd.np.where(temp.str.contains("conference"), "conference",
                   pd.np.where(temp.str.contains("call"), "call", "task"))))
DovaX
  • 958
  • 11
  • 16
3

you have an invalid syntax for checking strings.

try using

 for i in df2['Activity']:
        if 'email' in i :
            df2['Activity_2'] = 'email'
Prakash Palnati
  • 3,231
  • 22
  • 35
2
  1. Your code had bugs- no colons on "elif" lines.
  2. You didn't mention you were using Pandas, but that's the assumption I'm going with.
  3. My answer handles defaults, uses proper Python conventions, is the most efficient, up-to-date, and easily adaptable for additional activities.

DEFAULT_ACTIVITY = 'task'


def assign_activity(todo_item):
    """Assign activity to raw text TODOs
    """
    activities = ['email', 'conference', 'call']

    for activity in activities:
        if activity in todo_item:
            return activity
        else:
            # Default value
            return DEFAULT_ACTIVITY

df = pd.DataFrame({'Activity': ['email person A', 'attend conference', 'call Charly'],
                   'Colleague': ['Knor', 'Koen', 'Hedge']})

# You should really come up with a better name than 'Activity_2', like 'Labels' or something.
df["Activity_2] = df["Activity"].apply(assign_activity)
Dave Liu
  • 906
  • 1
  • 11
  • 31
1

Another solution can be found in a post made by @unutbu. This also works great for creating conditional columns. I changed the example from that post df['Set'] == Z to match your question to df['Activity'].str.contains('yourtext'). See an example below:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Activity': ['email person A', 'attend conference', 'call foo']})

conditions = [
    df['Activity'].str.contains('email'),
    df['Activity'].str.contains('conference'),
    df['Activity'].str.contains('call')]

values = ['email', 'conference', 'call']

df['Activity_2'] = np.select(conditions, values, default='task')

print(df)

You can find the original post here: Pandas conditional creation of a series/dataframe column

Hedge92
  • 543
  • 5
  • 9