100

I'm trying to set the entire column of a dataframe to a specific value.

In  [1]: df
Out [1]: 
     issueid   industry
0        001        xxx
1        002        xxx
2        003        xxx
3        004        xxx
4        005        xxx

From what I've seen, loc is the best practice when replacing values in a dataframe (or isn't it?):

In  [2]: df.loc[:,'industry'] = 'yyy'

However, I still received this much talked-about warning message:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead

If I do

In  [3]: df['industry'] = 'yyy'

I got the same warning message.

Any ideas? Working with Python 3.5.2 and pandas 0.18.1.


EDIT Jan 2023:

Given the volume of visits on this question, it's worth stating that my original question was really more about dataframe copy-versus-slice than "setting value to an entire column".

  • On copy-versus-slice: My current understanding is that, in general, if you want to modify a subset of a dataframe after slicing, you should create the subset by .copy(). If you only want a view of the slice, no copy() needed.
  • On setting value to an entire column: simply do df[col_name] = col_value
jtlz2
  • 7,700
  • 9
  • 64
  • 114
data-monkey
  • 1,535
  • 3
  • 15
  • 24
  • 7
    You must have done something to `df` prior to calling `df.loc[:,'industry']='yyy'` as what you posted should've worked. Basically the warning gets raised if you took a slice or sub-section of your starting df which you didn't show – EdChum Jun 23 '17 at 14:36
  • Does this answer your question? [How to deal with SettingWithCopyWarning in Pandas](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) – Michael Delgado Oct 27 '22 at 20:39

11 Answers11

148

You can use the assign function:

df = df.assign(industry='yyy')
nbro
  • 15,395
  • 32
  • 113
  • 196
Mina HE
  • 1,666
  • 2
  • 11
  • 4
38

Python can do unexpected things when new objects are defined from existing ones. You stated in a comment above that your dataframe is defined along the lines of df = df_all.loc[df_all['issueid']==specific_id,:]. In this case, df is really just a stand-in for the rows stored in the df_all object: a new object is NOT created in memory.

To avoid these issues altogether, I often have to remind myself to use the copy module, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object. I had the same problem as you, and avoided it using the deepcopy function.

In your case, this should get rid of the warning message:

from copy import deepcopy
df = deepcopy(df_all.loc[df_all['issueid']==specific_id,:])
df['industry'] = 'yyy'

EDIT: Also see David M.'s excellent comment below!

df = df_all.loc[df_all['issueid']==specific_id,:].copy()
df['industry'] = 'yyy'
Alex P. Miller
  • 2,128
  • 1
  • 23
  • 20
33
df.loc[:,'industry'] = 'yyy'

This does the magic. You are to add '.loc' with ':' for all rows. Hope it helps

Nwoye CID
  • 834
  • 8
  • 8
14

You can do :

df['industry'] = 'yyy'
HH1
  • 598
  • 1
  • 6
  • 13
  • 23
    Still the same warning message. – data-monkey Jun 23 '17 at 14:12
  • How is constructed your dataframe ? Because I don't have this warning when doing this with a random dataframe. – HH1 Jun 23 '17 at 14:14
  • 1
    df is taken from a broader dataframe df_all. something like df = df_all.loc[df_all['issueid']==specific_id,:]. I think you've got it, because when I do df_all['industry']='yyy' I don't see this message. But I don't know why df is not a "normal" dataframe. – data-monkey Jun 23 '17 at 14:19
  • 1
    it's your df = df_all.loc[df_all['issueid']==specific_id,:] ; try to use df = df_all[df_all['issueid']==specific_id] instead – HH1 Jun 23 '17 at 14:22
  • Yes got it. Care to explain what's the difference between the two? Does it have something to do "a copy"? – data-monkey Jun 23 '17 at 14:26
  • I don't really know, I've learned with this synthax, sorry :) – HH1 Jun 23 '17 at 14:27
  • `df['industry']` returns a reference to only a slice of your entire data set (if your data set is large). So you're really only changing a small portion of all the data. This is meant to be used as a fast preview. Use `.loc` instead. – NoName Feb 02 '20 at 04:02
4

Assuming your Data frame is like 'Data' you have to consider if your data is a string or an integer. Both are treated differently. So in this case you need be specific about that.

import pandas as pd

data = [('001','xxx'), ('002','xxx'), ('003','xxx'), ('004','xxx'), ('005','xxx')]

df = pd.DataFrame(data,columns=['issueid', 'industry'])

print("Old DataFrame")
print(df)

df.loc[:,'industry'] = str('yyy')

print("New DataFrame")
print(df)

Now if want to put numbers instead of letters you must create and array

list_of_ones = [1,1,1,1,1]
df.loc[:,'industry'] = list_of_ones
print(df)

Or if you are using Numpy

import numpy as np
n = len(df)
df.loc[:,'industry'] = np.ones(n)
print(df)
4

This provides you with the possibility of adding conditions on the rows and then change all the cells of a specific column corresponding to those rows:

df.loc[(df['issueid'] == '001'), 'industry'] = str('yyy')
Azim
  • 1,596
  • 18
  • 34
3

Seems to me that:

df1 = df[df['col1']==some_value] will not create a new DataFrame, basically, changes in df1 will be reflected in the parent df. This leads to the warning.
Whereas, df1 = df[df['col1]]==some_value].copy() will create a new DataFrame, and changes in df1 will not be reflected in df. The copy method is recommended if you don't want to make changes to your original df.

rachwa
  • 1,805
  • 1
  • 14
  • 17
hukai916
  • 81
  • 3
2

Only use them instead:

df.iloc[:]['industry'] = 'yyy'

remember: this only works with exist columns in dataframe

this for people who didn't work .loc

2

For anyone else coming for this answer and doesn't want to use copy -

df['industry'] = df['industry'].apply(lambda x: '')
1

I had a similar issue before even with this approach df.loc[:,'industry'] = 'yyy', but once I refreshed the notebook, it ran well.

You may want to try refreshing the cells after you have df.loc[:,'industry'] = 'yyy'.

John Mutuma
  • 3,150
  • 2
  • 18
  • 31
-3

if you just create new but empty data frame, you cannot directly sign a value to a whole column. This will show as NaN because the system wouldn't know how many rows the data frame will have!You need to either define the size or have some existing columns.

df = pd.DataFrame()
df["A"] = 1
df["B"] = 2
df["C"] = 3
Tao
  • 366
  • 1
  • 3
  • 15