0

Trying to create a new column in my dataframe call "Method". Current dataFrame in the attached picture:enter image description here

I'm trying to use if/elif/else as well as regex to create the new column but when I run this code, I get only the value that comes from the else statement. Why isn't this working and how can I fix it?

if 'posted' in df2.Full.astype(str) and '/ Outbound' in df2.TPrev.astype(str):
    df2['Method']='Classifieds Homepage Button'
elif 'ad posted' in df2.Full.astype(str) and 'thanks' in df2.TPrev.astype(str):
    df2['Method']='Header after Post'
elif 'ad posted' in df2.Full.astype(str) and '/myaccount/listing-classified Outbound' in df2.TPrev.astype(str):
    df2['Method']='My Listings Button'    
elif 'ad posted' in df2.Full.astype(str) and '/s/' in df2.TPrev.astype(str):
    df2['Method']='SRP'  
elif 'ad posted' in df2.Full.astype(str) and '/myaccount/listing-classified nan' in df2.TPrev.astype(str):
    df2['Method']='My Listings Button'
elif 'ad posted' in df2.Full.astype(str) and '/sell nan nan' in df2.TPrev and '/myaccount/listing-classified nan nan' in df2.Prev.astype(str):
    df2['Method']='My Listings Header'
elif 'ad posted' in df2.Full.astype(str) and '/listing/' in df2.TPrev.astype(str):
    df2['Method']='Detail Page Header'
elif 'ad posted' in df2.Full.astype(str) and '/search/' in df2.TPrev.astype(str):
    df2['Method']='SRP'
else:
    df2['Method']='Ignore'
Ryan Ball
  • 29
  • 3
  • 3
    Your logical statements evaluate to a **single** truth value for the **entire** DataFrame, so you would only ever set the entire column to one of those values. Instead you should use `np.select` like in https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column to apply conditional logic across each row. You also probably want to switch the statements to things like `df2.Full.astype(str).str.contains('ad posted')` as those return Boolean Series. – ALollz Mar 04 '20 at 19:48
  • 2
    As the above comment states, you are overwriting `df2['Method']` with each execution. Apart from `np.select`, you can create an empty `df2['Method']` column first and then fill in using a loop with your conditions. – Gary Mar 04 '20 at 19:50
  • Have you not read the Pandas docs? – AMC Mar 04 '20 at 21:05
  • Does this answer your question? [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) – AMC Mar 04 '20 at 21:05

1 Answers1

0

As the guys in the comments suggest, the problem is when you assign a single value to a column you just rewrite all the column to have the same value as the one you assigned. What you want to do is:

  1. Instead of changing the type to str every row, just change the whole dataframe :

    df2.astype(str)

  2. You need to have a logic that will be used on every row of the dataframe to determine the value to the "Method" column. The simplest way will be using the function you built and calling it with apply:

def my_logic(row):
   if 'posted' in row.Full and '/ Outbound' in row.TPrev:
      return "Classified Homepage Button"
   elif 'ad posted' in row.Full and 'thanks' in row.TPrev:
      return 'Header after Post'
   elif 'ad posted' in row.Full and '/myaccount/listing-classified Outbound' in row.TPrev:
      return 'My Listings Button'
   elif 'ad posted' in row.Full and '/s/' in row.TPrev:
      return 'SRP'
   elif 'ad posted' in row.Full and '/myaccount/listing-classified nan' in row.TPrev:
      return 'My Listings Button'
   elif 'ad posted' in row.Full and '/sell nan nan' in row.TPrev and '/myaccount/listing-classified nan nan' in row.Prev:
      return 'My Listings Header'
   elif 'ad posted' in row.Full and '/listing/' in row.TPrev:
      return 'Detail Page Header'
   elif 'ad posted' in row.Full and '/search/' in row.TPrev:
      return 'SRP'
   else:
      return 'Ignore'

df2['Method'] = df2.apply(lambda row: my_logic(row), axis=1)

This will be the simplest transformation but I think a much more elegant solution will be using np.select - create a list of your choices and a list of True/False according to your logics. Example with the first 3 conditions:

conditions = [
   ('posted' in df2.Full) & ('/ Outbound' in df2.TPrev),
   ('ad posted' in df2.Full) & ('thanks' in df2.TPrev),
   ('ad posted' in df2.Full) & ('/myaccount/listing-classified Outbound' in df2.TPrev)]
choices = ['"Classified Homepage Button"', 'Header after Post', 'My Listings Button']
df2['Method'] = np.select(conditions, choices, default='Ignore')