0

I am getting a SettingWithCopyWarning when attempting to create a new column on a pandas dataframe using a function I created to return a value for that new column. I am using the movielens dataset and predicting the rating of a user on a movie.

This is an example of my dataframe:

enter image description here

Now if I want to add a new column called 'prediction' that sends the user_id and item_id to my function and return the prediction I have followed the advice of this other question

Hence using the code:

df['pred'] = df.apply(lambda x: predict_rating(x['user_id'], x['item_id']), axis =1)

Yet I keep getting the SettingWithCopyWarning.

:44: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Any advice would be welcome.

ajscriv
  • 31
  • 1
  • 4
  • Are you sure the warning doesn't come from your function predict_rating ? If it is, can you post the problematic part ? – Tbaki Jun 07 '17 at 14:10
  • What do you do before that line? You can get that warning if `df` was created from another dataframe before that without an explicit `.copy()`. – EFT Jun 07 '17 at 14:14
  • @Tbaki no because when I set the predict_rating function to be; def predict_rating(item,user): return item,user it gives the same error – ajscriv Jun 07 '17 at 14:14
  • @EFT the code to create this df is: df = pd.read_csv(filepath) df = df.iloc[:, np.r_[1:4]] – ajscriv Jun 07 '17 at 14:16
  • 2
    @ajscriv it could means you tried to assign a value to x['user_id'], pandas tells you that if it's better for assignation use something like https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.at.html Can you give a reproductible exemple for us to play arround ? – Tbaki Jun 07 '17 at 14:22

3 Answers3

0

Do this ,

df.loc[:,'pred'] = df.apply(lambda x: predict_rating(x['user_id'], x['item_id']), axis =1)

nithish08
  • 468
  • 2
  • 7
0

It worked for me with this minimal example:

import pandas as pd

df = pd.DataFrame({'user_id':[22,224], 'item_id': [377,29], 'rating': [1,3]})
def prediction_func(row):
    return row['user_id'] + row['item_id']

df['prediction'] = df.apply(prediction_func, axis=1)
print(df.head())

Output:

   item_id  rating  user_id  prediction
0      377       1       22         399
1       29       3      224         253
Euphe
  • 3,531
  • 6
  • 39
  • 69
0

I think it has to do with my function after all so will dig into that and report anything interesting.

ajscriv
  • 31
  • 1
  • 4