1

I have this dataframe:

FamilyVsWWTotal = pd.DataFrame({'date':['2011-4-3','2011-3-5','2011-4-3','2011-5-7'], 'Country':['USA','CAN','USA','MEX']})

So it will have following output:

date          Country
2011-4-3      USA
2011-3-5      CAN
2011-4-3      USA
2011-5-7      MEX

And then below code is 2 created table 1:

 Table1=FamilyVsWWTotal
 Table1['Region']='K'

Hence the Table1 will have following output:

date          Country   Region
2011-4-3      USA       K
2011-3-5      CAN       K
2011-4-3      USA       K
2011-5-7      MEX       K

For the second table which is Table2:

Table2=FamilyVsWWTotal
Table2['aveAge']=60

Hence Table2 output should be like this:

date          Country   aveAge
2011-4-3      USA       60
2011-3-5      CAN       60
2011-4-3      USA       60
2011-5-7      MEX       60

But I always face this problem which is the Table2output will be look like this which already mix up the code from the above:

Table2

date          Country   Region   aveAge
2011-4-3      USA       K        60
2011-3-5      CAN       K        60
2011-4-3      USA       K        60
2011-5-7      MEX       K        60

Anyone can give some ideas how to solve this issue?

4 Answers4

4

You need copy for new DataFrame object, because else working with same (without copy is not copied by value, but creating a new reference, so id is same):

Table1=FamilyVsWWTotal.copy()

print (id(Table1))
243907608
print (id(FamilyVsWWTotal))
193392088
#crete new object (unlink to original df)
Table1 = FamilyVsWWTotal.copy()
Table2=FamilyVsWWTotal.copy()
Table1['Region'] = 'K'
Table2['aveAge']=60
print (FamilyVsWWTotal)
  Country      date
0     USA  2011-4-3
1     CAN  2011-3-5
2     USA  2011-4-3
3     MEX  2011-5-7

print (Table1)
  Country      date Region
0     USA  2011-4-3      K
1     CAN  2011-3-5      K
2     USA  2011-4-3      K
3     MEX  2011-5-7      K

print (Table2)
  Country      date  aveAge
0     USA  2011-4-3      60
1     CAN  2011-3-5      60
2     USA  2011-4-3      60
3     MEX  2011-5-7      60

#create refernce only
Table1=FamilyVsWWTotal

print (id(Table1))
244095352
print (id(FamilyVsWWTotal))
244095352

Still modify original DataFrame FamilyVsWWTotal:

Table1 = FamilyVsWWTotal
Table2=FamilyVsWWTotal
Table1['Region'] = 'K'
Table2['aveAge']=60
print (FamilyVsWWTotal)
  Country      date Region  aveAge
0     USA  2011-4-3      K      60
1     CAN  2011-3-5      K      60
2     USA  2011-4-3      K      60
3     MEX  2011-5-7      K      60

print (Table1)
  Country      date Region  aveAge
0     USA  2011-4-3      K      60
1     CAN  2011-3-5      K      60
2     USA  2011-4-3      K      60
3     MEX  2011-5-7      K      60

print (Table2)
  Country      date Region  aveAge
0     USA  2011-4-3      K      60
1     CAN  2011-3-5      K      60
2     USA  2011-4-3      K      60
3     MEX  2011-5-7      K      60
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
3

Use .assign returns a new dataframe.

In [137]: Table1 = FamilyVsWWTotal.assign(Region='K')

In [138]: Table1
Out[138]:
  Country      date Region
0     USA  2011-4-3      K
1     CAN  2011-3-5      K
2     USA  2011-4-3      K
3     MEX  2011-5-7      K

In [139]: Table2 = FamilyVsWWTotal.assign(aveAge=60)

In [140]: Table2
Out[140]:
  Country      date  aveAge
0     USA  2011-4-3      60
1     CAN  2011-3-5      60
2     USA  2011-4-3      60
3     MEX  2011-5-7      60

In [141]: FamilyVsWWTotal
Out[141]:
  Country      date
0     USA  2011-4-3
1     CAN  2011-3-5
2     USA  2011-4-3
3     MEX  2011-5-7

Or, work on copy

In [142]: Table1 = FamilyVsWWTotal.copy()

In [143]: Table1['Region'] = 'K'

In [144]: Table1
Out[144]:
  Country      date Region
0     USA  2011-4-3      K
1     CAN  2011-3-5      K
2     USA  2011-4-3      K
3     MEX  2011-5-7      K

In [145]: FamilyVsWWTotal
Out[145]:
  Country      date
0     USA  2011-4-3
1     CAN  2011-3-5
2     USA  2011-4-3
3     MEX  2011-5-7
Zero
  • 74,117
  • 18
  • 147
  • 154
2

Performing Table1=FamilyVsWWTotal results in shallow copy. The two variables link to the same DataFrame object in the memory.

You can check this by:

print(id(Table1))
print(id(FamilyVsWWTotal))


If you want two separate DataFrames, where you can do separate calculations, you must perform a deep copy, like this:
Table1 = FamilyVsWWTotal.copy(deep=True)

As deep=True is default with copy(), you can just do

Table1 = FamilyVsWWTotal.copy()

Similarly,

Table2 = FamilyVsWWTotal.copy(deep=True)


Thus, your code should look like this:
FamilyVsWWTotal = pd.DataFrame({'date':['2011-4-3','2011-3-5','2011-4-3','2011-5-7'], 'Country':['USA','CAN','USA','MEX']})
Table1=FamilyVsWWTotal.copy()
Table1['Region']='K'
Table2=FamilyVsWWTotal.copy()
Table2['aveAge']=60
nikpod
  • 1,238
  • 14
  • 22
0

While others already answered your question, I think knowing the difference between a deep copy and a shallow help would help next time:

What is the difference between a deep copy and a shallow copy?

Jape
  • 11
  • 5