6

So I made an empty dataframe using

df=data[['ID','Matrix','Name','Country', 'Units']]
df['Value']=''

and I am filling it in with code like this, which finds strings containing values of 'Good', 'Bad' in df.Matrix and filling them with values in sch[i]:

df.loc[df.Matrix.str.contains('Good'),'Value'] = sch[2]
df.loc[df.Matrix.str.contains('Bad'),'Value'] = sch[6]
df.loc[df.Matrix.str.contains('Excellent'),'Value'] = sch[8]

I have been getting a bunch of errors like both of these two different ones:

C:\Python33\lib\site-packages\pandas\core\strings.py:184: UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
  " groups, use str.extract.", UserWarning)

C:\Users\0\Desktop\python\Sorter.py:57: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
  df.loc[df.Matrix.str.contains('Bad'),'Value'] = sch[6]

So far I am suppressing the code using

pd.options.mode.chained_assignment = None

If I do not suppress the error messages I will get about 20 of them. Is there another format I can change the data so that I do not get the error message?

I am using python 3 and pandas 0.131 if it helps

user3084006
  • 5,344
  • 11
  • 32
  • 41

1 Answers1

7

Here is a good explanation of why this warning was turned on:

Pandas: Chained assignments

Are you sure that is all of your code? Pls show all of what you are doing.

In [13]: df = DataFrame(index=range(5))

In [14]: df['Value'] = ''

In [15]: df.loc[[1,4],'Value'] = 'bad'

In [16]: df.loc[[0,3],'Value'] = 'good'

In [17]: df
Out[17]: 
  Value
0  good
1   bad
2      
3  good
4   bad

[5 rows x 1 columns]

2nd example

In [1]: df = DataFrame(index=range(5))

In [2]: df['Value'] = ''

In [3]: df2 = DataFrame(dict(A=['foo','foo','bar','bar','bah']))

In [4]: df
Out[4]: 
  Value
0      
1      
2      
3      
4      

[5 rows x 1 columns]

In [5]: df2
Out[5]: 
     A
0  foo
1  foo
2  bar
3  bar
4  bah

[5 rows x 1 columns]

In [6]: df.loc[df2.A.str.contains('foo'),'Value'] = 'good'

In [7]: df.loc[df2.A.str.contains('bar'),'Value'] = 'bad'

In [8]: df
Out[8]: 
  Value
0  good
1  good
2   bad
3   bad
4      

[5 rows x 1 columns]
Community
  • 1
  • 1
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Your code assumes you know the location of every value my code finds matches and tags them. This is not all the code because the file is too large – user3084006 Feb 04 '14 at 19:08
  • In my case above I do not know where the value is in `df.Matrix` so `str.contains('Good')` checks for a `bool value` before tagging it with whatever is in sch[i] and the part that check for the bool value causes the error. Your code does not have an error because you assigned locations and inputs values. It is not the same because if I give you a file with mixed data and tell you to extract 'Good' your program would not work – user3084006 Feb 04 '14 at 19:32
  • I added an example like your; it IS the same, I know because I wrote the code :) – Jeff Feb 04 '14 at 19:42
  • Strange I do not know why I am getting the errors. They seem to be the same. Is it because all my data are being edited on the same dataframe? – user3084006 Feb 04 '14 at 19:50
  • when you say you 'made an empty dataframe', can you put that up? – Jeff Feb 04 '14 at 19:55
  • by that I just mean `df['Value']=''` basically I took rows out of a csv file and want to mark it if a specific condition exist. I never got the `SettingWithCopyWarning:` using Python 2.7 and just started to get it in python 3. I am not very happy with the switch. – user3084006 Feb 04 '14 at 20:03
  • that is the problem; it is warning you that you may be modifying another frame inadvertently. either explicity copy it, or you can set ``is_copy=False``. The point is virtually all of pandas operations return a copy; modifying in-place is not a good idea (with the one exception of setting). – Jeff Feb 04 '14 at 20:05
  • What is a good way to avoid this? Make a duplicate dataframe and modify that, modify an column of the same length and append it at the end, or just go with the suppressing of the error? – user3084006 Feb 04 '14 at 20:22
  • 3
    best way is to create the Series, then just assign it directly, e.g. ``df['Value'] = s``, rather than creating it empty and overwriting values. Just create the Series as you need it; pandas will align it (filling the remaining values with nan) – Jeff Feb 04 '14 at 20:25