0

Working with a Pandas DataFrame, df, and a function as follows

def code(x):
    for item in x:
        if x in [21,32]:
            return'Cat A'
        elif x in [22,34]:
            return"Cat B"
        else:
            print ('sorry')

I have a DataFrame, df, which has one Column ('Ref') containing numbers

df = 

**Document No**     **Date**.     **Ref**
2018-0212        2019-03-28       71
2018-0212R1      2019-03-28       71
2019-0004        2019-01-11       34
2019-0005        2019-01-14       25

I wish to iterate over that Column 'Ref' with the function above and return the result in a new Column appended to the df (i.e. a Column which would contain 'CAT A',or 'CAT B, or 'Sorry"

I have tried

df.apply.code(df['Ref'])

without success. Any thoughts? thanks

ALollz
  • 57,915
  • 7
  • 66
  • 89
Prolle
  • 358
  • 1
  • 10
  • 1
    Have a look at: https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column to see how to conditionally create a column, specifically the section in the accepted answer on "If you have more than two conditions" – ALollz Jun 19 '20 at 20:30
  • If you want to apply a function to a dataframe check out the pandas apply documentation to learn how to use it correctly (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) the correct syntax for apply would be `df.apply(code)` – eNc Jun 19 '20 at 20:32

1 Answers1

4

Using .loc conditions

This solution is quite straightforward, each row will assign a value based on the condition in the .loc(). The last row uses .fillna() to assign a default value.

df.loc[df['your_column'].isin([21,32]), 'Category'] = 'CAT A'
df.loc[df['your_column'].isin([22,34]), 'Category'] = 'CAT B'
df['Category'] = df.Category.fillna('Sorry') 

Using np.select

This is the method described in this answer suggested by @ALollz. It is probably the best way to proceed, but it's somewhat burdensome for simple cases.

First you need to list your conditions and choices, then, using np.selectyou can attribute a 'Category' value based on the given conditions. The default parameter will be used to fill where all conditions have failed.

conditions = [df['your_column'].isin([21,32]), df['your_column'].isin([22,34])]
choices = ['CAT A', 'CAT B']
df['Category'] = np.select(conditions, choices, default="Sorry")
Hugolmn
  • 1,530
  • 1
  • 7
  • 20