1

I'm working with pandas dataframe and I have a variable that has the values of one of the columns. When I change the values directly in the dataframe, the values stored in the variable are replaced too, is this a bug or is there any logic behind?

The idea is to change the df['b]'s value and keep the values intact, for other usages. Here's a sample code:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.random(size=(100, 2)), columns=['a', 'b'])

values = df['b'].values
peaks = [0, 5, 10, 15, 20, 25]

print(values[peaks])
df['b'][peaks] = np.nan
print(values[peaks])

and the output:

>>> [0.69820632 0.8375975  0.84961463 0.97845189 0.82764414 0.93884249]
>>> [nan nan nan nan nan nan]
MigasTigas
  • 43
  • 4
  • 1
    In fact the *values* variable is just a "pointer" or an alias of df['b'].values. When you change values within the dataframe column, then *values* points to the changed values. You have to create a copy (deepcopy) of the column and save it as *values*. – Lukas Jun 25 '20 at 18:08
  • Pandas now recommends using `to_numpy()` instead of `values`. `to_numpy` takes a `copy` parameter. Read its docs. – hpaulj Jun 25 '20 at 18:42
  • 1
    @Deepak, the copy vs deepcopy thing doesn't apply here. These are numpy arrays where the key distinction is `view` versus `copy` – hpaulj Jun 25 '20 at 22:07

2 Answers2

2

Use the copy method to create a deep copy.

values = df['b'].values.copy()
Balaji Ambresh
  • 4,977
  • 2
  • 5
  • 17
  • It worked but now it's displaying a warning `SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame` – MigasTigas Jun 25 '20 at 18:27
  • 1
    Here's a [post](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) that describes the setting to mute that warning. Try `pd.options.mode.chained_assignment = None` – Balaji Ambresh Jun 25 '20 at 18:47
2

This is common behavior when dealing with arrays and lists as these data types use referencing when they're assigned to a variable. So whenever the original array is modified, all the variables which are pointing to that array as reference will also change.

In order to stop that from happening it's best to store its sliced part in the variable or to create a copy of it.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.random(size=(100, 2)), columns=['a', 'b'])

values = df['b'].values.copy() #or df['b'].values[::]
peaks = [0, 5, 10, 15, 20, 25]

print(values[peaks])
df['b'][peaks] = np.nan
print(values[peaks])
luctivud
  • 76
  • 7
  • "these data types use referenceing when they are assigned to a variable" **all** data types use the same exact reference semantics when assigned to a variable. Note, slicing with `numpy.ndarray` objects, which `.values` will be, produces *views* over the underlying data in the array object, so it wouldn't help. – juanpa.arrivillaga Jun 25 '20 at 18:13