Using df['C'] vs. df.loc[:, 'C'] to assign new column in Pandas dataframe

Question

I have a dataframe:

df = pd.DataFrame({'A':np.random.randint(1,10, 10), 'B':np.random.randint(1,10, 10)})

def sumf(row):
    result = None
    if row['A']>= row['B']:
        result = row['A'] - row['B']
    else:
        result = row['B'] - row['A']
    return result

df.loc[:,'C'] = df.apply(sumf, axis = 1)
df['D'] = df.apply(sumf, axis = 1)
my_var = 'zero'
df['E'] = my_var

What would be the difference in terms of view/copy for column C and D? And is it the right way to fill column E with zero? I have a similar data frame with the same data and logic (just in another jupyter notebook), but there I am getting a warning:

/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

When I try these lines:

df['D'] = df.apply(sumf, axis = 1)
my_var = 'zero'
df['E'] = my_var

I dont get this error. Is there another piece of code your are running that's causing this error upstream? — DJK, Nov 21 '18 at 15:52
Sorry, didn't figure out how to post formatted code as a comment. So my production code is the same (except column names, type of calculation and the data). And I don't get the warning also (when I run the code above) but I do get this warning when I run my production code. — Ildar Gabdrakhmanov, Nov 21 '18 at 16:10

pjw · Answer 1 · 2018-11-21T16:44:23.027

The SettingWithCopyWarning is a warning related to the possibility of chained assignment. From the docs on "Returning a view versus a copy", it states "The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid assignment. There may be false positives; situations where a chained assignment is inadvertently reported."

I am not able to reproduce this warning when running your code (with pandas==0.23.4 and Python 2.7.15). Possibly you are running a different version of pandas? This post seems to indicate that this is a pandas version issue. Likely if you upgrade your pandas version, the warning will not appear, and both ways to assign a new column (df.loc[:,'C'] or df['C']) are valid. Make sure your versions of python, pandas and numpy are upgraded and the same in your different environments.

In your case, the warning is certainly a false-positive situation, since you are defining new columns in your original dataframe (not using a copy of the dataframe).

And, yes, if you want to fill column E with the string zero, this is an appropriate way to do so.

Using df['C'] vs. df.loc[:, 'C'] to assign new column in Pandas dataframe

1 Answers1