0

I hope you can help me. I'm looking for to classify some product based on the size: 40ML or other.

Here is my piece of code:

1. Dataframe creation

test = {'Name':['ProductA 40ML','ProductB 100ML','ProductC 40ML','ProductD 100ML']}
df1=pd.DataFrame(test)

2. Function built for classification

def size_class(row):
    if  row['Name'].str.contains('40ML'):
        val = '40ML'
    else:
        val = 'other'
    return val


df1['size_classification'] = df1.apply(size_class, axis=1)

Error message:

However the function returns the following error: AttributeError: 'str' object has no attribute 'str'

Question

Would you please be able to help me fix this one? I had a look at existing issues but couldn't find any answer addressing this.

1 Answers1

0

I figure out some things you missed in your implementation:

  1. In Python for most of the cases of membership tests, the operator in is more relevant than contains. Membership test operations documentation, see more details in this SOF question: Does Python have a string 'contains' substring method?
  2. The default of the apply function is to look at the value of specific column, so you don't need to apply it on the whole data frame, but only on the relevant column.
  3. The function applied with 'apply' looks separately on every cell's value. In your case, it's a string so you don't need to cast things.

So, the code that fixes your bugs is:

import pandas as pd
test = {'Name':['ProductA 40ML','ProductB 100ML','ProductC 40ML','ProductD 100ML']}
df1=pd.DataFrame(test)

def size_class(row):
    if  '40ML' in row:
        val = '40ML'
    else:
        val = 'other'
    return val


df1['size_classification'] = df1['Name'].apply(size_class)

print(df1.head())

enter image description here

Yanirmr
  • 923
  • 8
  • 25
  • Hello Yanirmr, thanks a lot for sharing the piece of code which is working very well! And also thanks for the detailed explanation. I'll have a deeper look at the 1: thanks for sharing. For 2. clear thanks. For 3. sorry but this is not clear to me - could you please let me know what you mean by "it's a string so you don't need to cast things"? Thanks a lot and have a nice day! – Antonybegood May 12 '22 at 15:55
  • @Antonybegood You're welcome. About the third point: what I tried to explain is that the function that you apply, receives the *content* of the cell, in your case - string. In that situation, there is no point to regard the input as a dataframe or a series, this is one-cell value. – Yanirmr May 12 '22 at 17:52
  • Sorry or the delay in my response. Thanks for your feedback: this is super clear. – Antonybegood May 17 '22 at 18:50