29

I'm trying to use pandas in order to change one of my columns in-place, using simple function.

After reading the whole Dataframe, I tried to apply function on one Series:

wanted_data.age.apply(lambda x: x+1)

And it's working great. The only problem occurs when I try to put it back into my DataFrame:

wanted_data.age = wanted_data.age.apply(lambda x: x+1)

or:

wanted_data['age'] = wanted_data.age.apply(lambda x: x+1)

Throwing the following warning:

> C:\Anaconda\lib\site-packages\pandas\core\generic.py:1974:
> SettingWithCopyWarning: A value is trying to be set on a copy of a
> slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
> value instead
> 
> See the the caveats in the documentation:
> http://pandas.pydata.org/pandas-docs/stable
> /indexing.html#indexing-view-versus-copy   self[name] = value

Of Course, I can set the DataFrame using the long form of:

wanted_data.loc[:, 'age'] = wanted_data.age.apply(lambda x: x+1)

But is there no other, easier and more syntactic-nicer way to do it?

Thanks!

jtlz2
  • 7,700
  • 9
  • 64
  • 114
Yam Mesicka
  • 6,243
  • 7
  • 45
  • 64
  • Is your `wanted_data` dataframe already a subset of another dataframe? – joris May 16 '15 at 14:19
  • Nope ^^", this is new DataFrame created by the .read_excel method – Yam Mesicka May 16 '15 at 14:21
  • What version of pandas are you using? I can't reproduce it with 0.16.1 – joris May 16 '15 at 14:57
  • Can you make a small reproducible example? Doing `df = pd.DataFrame({'a':[1,2,3], 'b':[0.1,0.2,0.3]}); df['a'] = df['a'].apply(lambda x: x+1)` does not give a warning for me. – joris May 16 '15 at 15:06
  • read the doc in the warning... the chain method may cause issues and the long method is the recommended way... However, the warning can be only false positive and you can turn it off " The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid assignment. There may be false positives; situations where a chained assignment is inadvertantly reported." – Shahram May 17 '15 at 01:05
  • Question : Does the long form method require more resources? Is it slower? – Imad Jun 27 '18 at 08:46
  • Also : Imagine the column name 'age' is stored in a variable, how to apply the long form as you did. Knowing, `wanted_data.variable_storing_age.apply(lambda x: x+1)` returns a `'DataFrame' object has no attribute 'variable_storing_age'` – Imad Jun 27 '18 at 08:49

3 Answers3

22

Use loc:

wanted_data.loc[:, 'age'] = wanted_data.age.apply(lambda x: x + 1)
Nathan Tuggy
  • 2,237
  • 27
  • 30
  • 38
Alexander
  • 105,104
  • 32
  • 201
  • 196
4

I would suggest wanted_data['age']= wanted_data['age'].apply(lambda x: x+1),then save file as wanted_data.to_csv(fname,index=False), where "fname" is the name of a file to be updated.

Irfanullah
  • 105
  • 11
2

I cannot comment, so I'll leave this as an answer.

Because of the way chained indexing is handled internally, you may get back a deep copy, instead of a reference to your initial DataFrame (For more see chained assignment - this is a very good source. Bare .loc[] always returns a reference). Thus, you may not assign back to your DataFrame, but to a copy of it. On the other hand, your format may return a reference to your initial Dataframe and, while mutating it, the initial DataFrame will mutate, too. Python prints this warning to beat the drum for the situation, so as the user can decide whether this is the wanted treatment or not.

If you know what you're doing, you can silence the warning using:

with pd.options.mode.chained_assignment = "None":
    wanted_data.age = wanted_data.age.apply(lambda x: x+1)

If you think that this is an important manner (e.g. there is the possibility of unintentionally mutating the initial DataFrame), you can set the above option to "raise", so that an error would be raised, instead of a warning.

Also, I think usage of the term "inplace" is not fully correct. "inplace" is used as an argument at some methods, so as to mutate an object without assigning it to itself (the assignment is handled internally), and apply() does not support this feature.

jtlz2
  • 7,700
  • 9
  • 64
  • 114
Thanasis Mattas
  • 484
  • 4
  • 16