18

Why is it not possible to pass attributes of an instance through a copy? I want to pass the name attribute to another dataframe.

import copy
df = pd.DataFrame([1,2,3])
df.name = 'sheet1'
df2 = copy.deepcopy(df)

print(f'df.name: {df.name}')
>> df.name: sheet1

print(f'df2.name: {df2.name}')
>>    AttributeError    
        ...      
      'DataFrame' object has no attribute 'name'

Similarly, why does this also not work, when creating a class and inheriting from it?

class DfWithName(pd.DataFrame):

    def __init__(self, *args, **kwargs):
        self.__init__ = super().__init__(*args, **kwargs)
        print('lol')

    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, value):
        self._name = value

and using the same code:

import copy
df = DfWithName([1,2,3])
df.name = 'sheet1'
df2 = copy.deepcopy(df) 
print(f'df.name: {df2.name}')
>>    AttributeError    
        ...      
      'DataFrame' object has no attribute 'name'
A H
  • 2,164
  • 1
  • 21
  • 36
  • Perhaps I found a wrong dupe... The point is your new attribute only belongs to this specific instance, it's not recorded in the class level. Thus `copy` couldn't find it. https://docs.python.org/3.6/library/copy.html – llllllllll May 16 '18 at 13:58
  • 7
    @liliscent that's not exactly accurate. For most instances `copy.deepcopy()` will copy its `__dict__` members, so if you assign some attribute to an instance, and then copy that instance, its attributes *will* be copied. More likely, `DataFrame`, being a rather non-trivial class, probably has some custom serialization/deep-copy implementation that doesn't know about arbitrary attributes attached to it. – Iguananaut May 16 '18 at 14:01
  • 4
    Indeed, `NDFrame` (which is the base class of `DataFrame`) has this custom `__deepcopy__`: https://github.com/pandas-dev/pandas/blob/master/pandas/core/generic.py#L5005 – Iguananaut May 16 '18 at 14:04
  • 3
    Yes, that's more accurate. You can override `__deepcopy__`, it's not surprising that it only records class related information. – llllllllll May 16 '18 at 14:07
  • Using `df.copy(deep=True)` doesn't alleviate the issue either. I now understand that the metadata copy is not implemented, but I would want to know what python mechanics are at play that is causing this. – A H May 16 '18 at 14:07

4 Answers4

13

As noted elsewhere, the DataFrame class has a custom __deepcopy__ method which does not necessarily copy arbitrary attributes assigned to an instance, as with a normal object.

Interestingly, there is an internal _metadata attribute that seems intended to be able to list additional attributes of an NDFrame that should be kept when copying/serializing it. This is discussed some here: https://github.com/pandas-dev/pandas/issues/9317

Unfortunately this is still considered an undocumented internal detail, so it probably shouldn't be used. From looking at the code you can in principle do:

mydf = pd.DataFrame(...)
mydf.name = 'foo'
mydf._metadata += ['name']

and when you copy it it should take the name with it.

You could subclass DataFrame to make this the default:

import functools

class NamedDataFrame(pd.DataFrame):
    _metadata = pd.DataFrame._metadata + ['name']

    def __init__(self, name, *args, **kwargs):
        self.name = name
        super().__init__(*args, **kwargs)

    @property
    def _constructor(self):
        return functools.partial(self.__class__, self.name)

You could also do this without relying on this internal _metadata attribute if you provide your own wrapper to the existing copy method, and possibly also __getstate__ and __setstate__.

Update: It seems actually use of the _metadata attribute for extending Pandas classes is now documented. So the above example should more or less work. These docs are more for development of Pandas itself so it might still be a bit volatile. But this is how Pandas itself extends subclasses of NDFrame.

Iguananaut
  • 21,810
  • 5
  • 50
  • 63
  • Yep, using the `_metadata` attribute hack works perfectly for my usecase. – A H May 16 '18 at 14:21
  • 1
    Please note my update to the example code. I haven't tested it but I think you'll probably need something like the above `_constructor`. – Iguananaut May 16 '18 at 14:25
4

The copy.deepcopy will use a custom __deepcopy__ method if it is found in the MRO, which may return whatever it likes (including completely bogus results). Indeed dataframes implement a __deepcopy__ method:

def __deepcopy__(self, memo=None):
    if memo is None:
        memo = {}
    return self.copy(deep=True)

It delegates to self.copy, where you will find this note in the docstring:

Notes
-----
When ``deep=True``, data is copied but actual Python objects
will not be copied recursively, only the reference to the object.
This is in contrast to `copy.deepcopy` in the Standard Library,
which recursively copies object data (see examples below).

And you will find in the v0.13 release notes (merged in PR 4039):

__deepcopy__ now returns a shallow copy (currently: a view) of the data - allowing metadata changes.

Related issue: 17406.

wim
  • 338,267
  • 99
  • 616
  • 750
1

Attaching custom metadata to DataFrames seems to be unsupported for pandas. See this answer (possible duplicate?) and this github issue.

apteryx
  • 1,105
  • 7
  • 14
  • It seems you haven't linked an answer in the first link, it's the same github issue you link afterwards. If you can, edit your answer to include the related, and possibly duplicate, answer. – Dimitris Fasarakis Hilliard May 16 '18 at 17:54
  • @JimFasarakisHilliard thanks, updated. although i'm pretty sure this isn't a duplicate, OP's question is more specific – apteryx May 16 '18 at 18:49
-2

This code is worked:

>>> class test():
...     @property
...     def name(self):
...         return self._name
...     @name.setter
...     def name(self, value):
...         self._name = value
...
>>>
>>> a = test()
>>> a.name = 'Test123'
>>> import copy
>>> a2 = copy.deepcopy(a)
>>> print(a2.name)
Test123

so I think that behavior is defined by pd.DataFrame

I found that pandas define the function __deepcopy__, but I cannot totally understand the reason.

pandas/core/indexes/base.py#L960

Brett7533
  • 342
  • 1
  • 3
  • 12