4

I've a dataframe (called data in this post) with positive and negative values in a column. I do the following:

data.Col.min() --> results in a negative value
data_abs = data
data_abs['Col'] = data_abs['Col'].abs()
data.Col.min() --> results to the lowest absolute value in the dataframe

In my opinion I've stored the absolute values in an own variable, so I'm wondering why the code line where I convert the values to absolute values, changes my source variable recursively.

I also get the same result when trying to convert the values by this:

data_abs['Col'] = abs(data_abs['Col'])
Derek O
  • 16,770
  • 4
  • 24
  • 43

1 Answers1

4

Since you have set both of your DataFrames equal to each other, changes to one of the DataFrames will affect the other.

As @Psidom pointed out, data_abs = data.copy() won't have this issue and this is what .copy is intended to do, as outlined in the documentation

For example if we have a DataFrame:

data = pd.DataFrame({'Col':[-1,-5,6,8],'Col2':[1,2,2,2]})

Then:

data_abs = data
data_abs['Col'] = data_abs['Col'].abs()

...will change both DataFrames:

>>> data
   Col  Col2
0    1     1
1    5     2
2    6     2
3    8     2

>>> data_abs
   Col  Col2
0    1     1
1    5     2
2    6     2
3    8     2

But if we instead use .copy:

data_abs = data.copy()
data_abs['Col'] = data_abs['Col'].abs()

The original DataFrame remains the same:

>>> data
   Col  Col2
0   -1     1
1   -5     2
2    6     2
3    8     2
Derek O
  • 16,770
  • 4
  • 24
  • 43