0

I'm trying to get df b column D to be 1, however, when I run this code, it also changes df a column D to 1 also... why is that, why are the variables linked? and how to I just change df b only?

import pandas as pd, os, numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
a=df
b=df
b['D']=1

output:

>>> a
    A   B   C  D
0  98  84   3  1
1  13  35  76  1
2  17  84  28  1
3  22   9  41  1
4  54   3  20  1
>>> b
    A   B   C  D
0  98  84   3  1
1  13  35  76  1
2  17  84  28  1
3  22   9  41  1
4  54   3  20  1
>>> 
jason
  • 3,811
  • 18
  • 92
  • 147

3 Answers3

3

a, b and df are references to the same object. When you change b['D'], you are actually changing that column of the actual object. Instead, it looks like you want to copy the DataFrame:

import pandas as pd, os, numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
a=df.copy()
b=df.copy()
b['D']=1

which yields

b.head()
Out: 
    A   B   C  D
0  63  52  92  1
1  98  35  43  1
2  24  87  70  1
3  38   4   7  1
4  71  30  25  1

a.head()
Out: 
    A   B   C   D
0  63  52  92  80
1  98  35  43  78
2  24  87  70  26
3  38   4   7  48
4  71  30  25  61

There are also detailed responses here.

Community
  • 1
  • 1
ayhan
  • 70,170
  • 20
  • 182
  • 203
2

Don't use = when trying to copy a dataframe

use pd.DataFrame.copy(yourdataframe) instead

a = pd.DataFrame.copy(df)
b = pd.DataFrame.copy(df)
b['D'] = 1

This should solve your problem

Phurich.P
  • 1,376
  • 5
  • 18
  • 33
1

You should use copy. Change

a=df
b=df

to

a=df.copy()
b=df.copy()

Check out this reference where this issue is discussed a bit more in depth. I also had this confusion when I started using Pandas.

splinter
  • 3,727
  • 8
  • 37
  • 82