0

PROBLEM: Newbie in python here! I want to create a new column applying get_place_context from geograpy3 package. However, I'm getting a SettingWithCopyWarning warning and I would like to do it in a proper way. I've read the documentation and also some similar questions but still, I don't understand how can I fix the code so the warning doesn't pop up.

DATA: My data looks like this:

   place lang                  user_location
0    NaN   es                            NaN
1    NaN   es                            NaN
2    NaN   en  Socialist Republic of Alachua
3    NaN   fr                Hérault, France
4    NaN   hi                 Gwalior, India

REPREX: First of all I filter the data if any of the values in place or user_location is Nan and then I apply the get_place_context from the geograpy package. Here is the reproducible example:

import pandas as pd
import geograpy
import numpy as np

df = pd.DataFrame.from_dict({'place': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan},
                             'lang': {0: 'es', 1: 'es', 2: 'en', 3: 'fr', 4: 'hi'},
                             'user_location': {0: np.nan,
                                               1: np.nan,
                                               2: 'Socialist Republic of Alachua',
                                               3: 'Hérault, France', 4: 'Gwalior, India'}})

sel_columns = ["place", "user_location"]
filtered_df = df[df[sel_columns].notna().any(axis=1)]
filtered_df['country'] = filtered_df['user_location'].apply(lambda x: geograpy.get_place_context(text=x).country_mentions)
#> <ipython-input-9-7e043cb42f69>:1: SettingWithCopyWarning: 
#> A value is trying to be set on a copy of a slice from a DataFrame.
#> Try using .loc[row_indexer,col_indexer] = value instead
#> 
#> See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
#>   filtered_df['country'] = filtered_df['user_location'].apply(lambda x: geograpy.get_place_context(text=x).country_mentions)

Created on 2020-06-02 by the reprexpy package

Tito Sanz
  • 1,280
  • 1
  • 16
  • 33
  • filtered_df is actually housing a reference for the df slice that you have performed. In case you want to avoid the warning try using .copy() after slice. filtered_df = df[df[sel_columns].notna().any(axis=1)].copy() – Mahendra Singh Jun 02 '20 at 09:37
  • filtered_df that you have created is a reference for rows from df[df[sel_columns].notna().any(axis=1)]. So when you are trying to set country on selected rows it is actually referencing that slice of df – Mahendra Singh Jun 02 '20 at 09:42
  • [How to deal with SettingWithCopyWarning in Pandas?](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) I find this answer to be very helpful – r-beginners Jun 02 '20 at 09:42
  • Ahh, ok. So with .copy() method I am saying that I want to create a new DataFrame, am I right? – Tito Sanz Jun 02 '20 at 09:43
  • In case you want to preserve df, yes. Else I would suggest use np.where or .loc to assign country – Mahendra Singh Jun 02 '20 at 09:59
  • As I don't want to preserve df, Im trying to avoid NaN in the 'user_location' column so now trying something like: `df['country'] = df.loc[df['user_location'.notna()]].apply(lambda x: geograpy.get_place_context(text=x).country_mentions)` but without success – Tito Sanz Jun 02 '20 at 10:06

0 Answers0