0

This works as expected:

a,b   = (0, ) * 2
print('Before') 
print(a,b) 

a = 1
print('\nAfter')
print(a,b)

# Before
# 0 0
#
# After
# 1 0

It does not work the same way in pandas, but appears that sb is merely a reference to the same series as sa:

sa, sb = (pd.Series(np.zeros(2)), ) * 2

print('Before')
for s in (sa,sb):
    print(s)

sa[0] = 1
print('\nAfter')
for s in (sa,sb):
    print(s)

# Before
# 0    0.0
# 1    0.0
# dtype: float64
# 0    0.0
# 1    0.0
# dtype: float64

# After
# 0    1.0
# 1    0.0
# dtype: float64
# 0    1.0          <- Note: sb[0] has also changed
# 1    0.0
# dtype: float64

Is this the expected behavior, and is it documented? It seems to violate the principle of least astonishment.

What's the most convenient work-around? Obviously this works:

sa = pd.Series(np.zeros(2))
sb = sa.copy() # note deep=True by default

But it's a bit verbose since I need to generate several series.

This does not work:

sa, sb = (pd.Series(np.zeros(2)).copy(), ) * 2
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
  • 3
    Afaik, all uses of `*` for "sequence multiplication" produce shallow copies. The difference between your examples is that ints are immutable and you're doing reassignment in the first example, and `Series` are mutable and you're carrying out mutations on them in the second example. – Carcigenicate Aug 12 '20 at 15:54
  • 2
    Looks like the same concept as in [this question](https://stackoverflow.com/questions/240178/list-of-lists-changes-reflected-across-sublists-unexpectedly) – ForceBru Aug 12 '20 at 15:54
  • 1
    `sa[0] = 1` is *quite* different than `a = 1`. The former is a (mutating) method call in disguise, while the latter is a true assignment. – chepner Aug 12 '20 at 15:56
  • 1
    Use list comprehension `[pd.Series(np.zeros(2)) for _ in range(2)]`. – Henry Yik Aug 12 '20 at 16:05
  • @HenryYik that works, thank you. – C8H10N4O2 Aug 12 '20 at 16:24

1 Answers1

1

As commented by Carcigenicate above, Pandas series are mutable, so writing to them also modifies the shallow copies. The same can be seen with a dict:

da, db = ({0:0,1:0}, )*2
print('Before') 
for d in (da, db):
  print(d) 

da[0]=1
print('\nAfter') 
for d in (da, db):
  print(d) 

# Before
# {0: 0, 1: 0}
# {0: 0, 1: 0}

# After
# {0: 1, 1: 0}
# {0: 1, 1: 0} <- note: db has also changed

Ints are immutable, so writing to them breaks the reference. This can be seen from the variable addresses:

a,b   = (0, ) * 2
print('Before') 
print(hex(id(a)),hex(id(b))) 

a = 1
print('\nAfter')
print(hex(id(a)),hex(id(b))) 

# Before
# 0x10ab9fef0 0x10ab9fef0

# After
# 0x10ab9ff10 0x10ab9fef0 <- first address changed

Compare with:

da, db = ({0:0,1:0}, )*2
print('Before') 
for d in (da, db):
  print(hex(id(d))) 

da[0]=1
print('\nAfter') 
for d in (da, db):
  print(hex(id(d))) 

# Before
# 0x137b9c320
# 0x137b9c320

# After
# 0x137b9c320 <- first address unchanged
# 0x137b9c320  

As Henry Yik comments, assigning from a list comprehension is a way to create deep copies from an iterable (unlike the * operator).

da, db = [pd.Series(np.zeros(2)) for _ in range(2)]
print('Before') 
for d in (da, db):
  print(hex(id(d))) 

# Before
# 0x137b9fad0 <- now it's different to begin with
# 0x137b93fd0
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
  • 1
    Although, in retrospect, my mention of immutable/mutable was a red herring. You'd get the same behavior as you had with `int`s with any mutable object if you did reassignment. It's more of a difference between "mutating a reference", and mutating an object. With immutable `int`s though, you just don't have the option of mutating them. – Carcigenicate Aug 12 '20 at 16:32