3

I'm trying to manipulate a dataframe that I received from Sci-Kit Learn's train_test_split operation. The system gives me the following:

/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py:179: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

The following raises the warning on my system:

import pandas as pd
from sklearn.model_selection import train_test_split
X=pd.DataFrame({'A':[2,5,7,8,9],'B':[2,5,3,51,5]})
(Xt,Xv)=train_test_split(X)
Xt.iloc[0,0]=6

I use the following versions:

python: '3.6.1 (default, Jun 26 2017, 19:29:26) \n[GCC 4.9.2]'

pandas: 0.20.3

sklearn: 0.18.2

Jonathan
  • 358
  • 3
  • 14

3 Answers3

8

You can workaround it as follows:

In [16]: Xt = Xt.copy()

In [17]: Xt.iloc[0,0]=6

In [18]: Xt
Out[18]:
   A  B
0  6  2
2  7  3
1  5  5

In [19]: X
Out[19]:
   A   B
0  2   2     # <--- NOTE: the value in the original DF has NOT been changed
1  5   5
2  7   3
3  8  51
4  9   5

Alternatively you can use numpy.split(...) method

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Thanks, that worked for me. But it seems very cumbersome and inefficient to copy all output variables of train_test_split after every usage. I guess I can wrap every call like so 'map(lambda x:x.copy(),train_test_split(X,y))' – Jonathan Jul 13 '17 at 21:14
  • @Jonathan, you may also try to use `np.split()` instead of `train_test_split` - i've added a link to the answer, containing an example... – MaxU - stand with Ukraine Jul 13 '17 at 21:29
  • I tested this and it also works for me. Do you know if np.split copies the data? (I assume the warning is raised for train_test_split because it doesn't copy the data but rather creates a 'view'?) – Jonathan Jul 13 '17 at 21:41
  • @Jonathan, i guess `np.split()` returns copies... – MaxU - stand with Ukraine Jul 13 '17 at 21:46
5

Another option is to reset the is_copy flat but it seems this is a bug of the train_test_split function.

Xt.is_copy=None
Allen Qin
  • 19,507
  • 8
  • 51
  • 67
2

Pandas spits out this warning too aggressively in general, you can see a good discussion here: How to deal with SettingWithCopyWarning in Pandas?

But if I'm confident that my code works as expected, I just use:

pd.options.mode.chained_assignment = None

at the top of my file. You'll always be able to tell if you're not updating your df because whatever you thought you did won't be there.

Abigail
  • 76
  • 3