0

First, I have looked at many SO threads on this and none seemed to work in make case. Creating a new column based on if-elif-else condition seemed to be the closest to what I am trying to do.

In my df I have a column with product names. I am trying to create a function that looks for a partial string match in each row of that column and based on the match it will create a label for each row in a new df column. I wanted to use a function because there are about 5 or 6 patterns that I need to match.

I am using contains() function to look for partial product title match. This returns a bool which I then check with else/if in the function:

def label_sub_cat():
    if data['product'].str.contains('Proceedings', case=False) is True:
        return 'Proceedings'
    elif data['product'].str.contains('DVD', case=False) is True:
        return 'DVD'
    else:
        return 'Other'

data['product_sub_cat'] = data.apply(label_sub_cat(), axis=1)

I keep getting the following error:

AttributeError: 'DataFrame' object has no attribute 'other'
eyllanesc
  • 235,170
  • 19
  • 170
  • 241
user3088202
  • 2,714
  • 5
  • 22
  • 36
  • My theory since you don't provide data: your function label_sub_cat() is defined over your dataframe, so when you do data.apply(label_sub_cat(), ...) you're effectively doing data.apply('Other' ...) hence the error – Yuca Oct 10 '18 at 15:58
  • @Yuca can you elaborate on ' defined over your dataframe' part? – user3088202 Oct 10 '18 at 16:04
  • @Yuca it returns 'other'. So it must be an issue within the function itself. I am at a loss as to why. I did verify that running each contains() portion of the code does return True/False values. – user3088202 Oct 10 '18 at 16:08
  • @user3088202 when you write `data.apply(label_sub_cat(), axis=1)` you are telling pandas to apply `label_sub_cat()`. Notice how you're calling it here? You need to pass the function *object* try: `data.apply(label_sub_cat, axis=1)` instead. Also, your function is not taking any input. It will return the same boolean values as it's not comparing anything. – Brian Oct 10 '18 at 16:10
  • @BrianJoseph this gives me TypeError: ('label_sub_cat() takes 0 positional arguments but 1 was given', 'occurred at index 0') – user3088202 Oct 10 '18 at 16:12
  • that's what I mean by *defined over the dataframe*. That function is just doing two comparisons, which evaluate to False and False, landing on the else condition that returns 'Other' – Yuca Oct 10 '18 at 16:13

2 Answers2

1

function in df.apply() should apply to each row of df, not for entire df.

In [37]: df = pd.DataFrame({'product':['aProcedings', 'aDVD','vcd']})
In [38]: def label_sub_cat(row):
...:     if 'Procedings' in row['product']:
...:         return 'Proceedings'
...:     elif 'DVD' in row['product']:
...:         return 'DVD'
...:     else:
...:         return 'Other'
...:
...:

In [39]: df['product_sub_cat'] = df.apply(label_sub_cat, axis=1)

In [40]: df
Out[40]:
       product product_sub_cat
0  aProcedings     Proceedings
1         aDVD             DVD
2          vcd           Other
Ysh Xiong
  • 77
  • 4
1

Just change your function

def label_sub_cat(row):
    if row.product.str.contains('Proceedings', case=False) is True:
        return 'Proceedings'
    elif row.product.str.contains('DVD', case=False) is True:
        return 'DVD'
    else:
        return 'Other'

data['product_sub_cat'] = data.apply(label_sub_cat, axis=1)
Yuca
  • 6,010
  • 3
  • 22
  • 42