0

I am a newbie and trying to figure out how to correctly use the .loc function in pandas for slicing a dataframe. Any help is greatly appreciated.

The code is:

df1['Category'] = df[key_column].apply(lambda x: process_df1(x, 'category'))

where df1 is a dataframe, key_column is a specific column identified to be operated upon process_df1 is a function defined to run on df1.

The problem is I am trying to avoid the error: "A value is trying to be set on a copy of a slice from a DataFrame. Try using

.loc[row_indexer,col_indexer] = value instead"

I don't want to ignore / suppress the warnings or set `pd.options.mode.chained_assignment = None.

Is there an alternative besides these 2?

I have tried using

df.loc[df1['Category'] = df[key_column].apply(lambda x: process_df1(x, 'category'))] 

but it still produces the same error. Am I using the .loc incorrectly?

Apologies if it is a confusing question.

df1 = df[:break_index] df2 = df[break_index:]

Thank you.

2 Answers2

0

The apply method performs the function in place to the series you are running it on (key_column in this case)

If you are trying to create a new column based upon a function using another column as input you can use list comprehension

df1['Category'] = [process_df1(x, 'category') for x in df1[key_column]]

NOTE I'm assuming process_df1 operates on a single value from the key_column column and returns a new value based upon your writing. If that's not the case please update your question

sedavidw
  • 11,116
  • 13
  • 61
  • 95
0

Unless you give more details on the source data and your expected results, we won't be able to provide you clear answer. For now, here's something I just created to help you understand how we can pass two values and get things going.

import pandas as pd

df = pd.DataFrame({'Category':['fruit','animal','plant','fruit','animal','plant'],
                   'Good' :[27, 82, 32, 91, 99, 67],
                   'Faulty' :[10, 5, 12, 8, 2, 12],
                   'Region' :['north','north','south','south','north','south']})

def shipment(categ,y):
    d = {'a': 0, 'b': 1, 'c': 2, 'd':3}

    if (categ,y) == ('fruit','a'):
        return 10
    elif (categ,y) == ('fruit','b'):
        return 20
    elif (categ,y) == ('animal','a'):
        return 30
    elif (categ,y) == ('animal','c'):
        return 40
    elif (categ,y) == ('plant','a'):
        return 50
    elif (categ,y) == ('plant','d'):
        return 60
    else:
        return 99

df['result'] = df['Category'].apply(lambda x: shipment(x,'a'))
print (df)
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33