18

How to implement the composition pattern? I have a class Container which has an attribute object Contained. I would like to redirect/allow access to all methods of Contained class from Container by simply calling my_container.some_contained_method(). Am I doing the right thing in the right way?

I use something like:

class Container:
   def __init__(self):
       self.contained = Contained()
   def __getattr__(self, item):
       if item in self.__dict__: # some overridden
           return self.__dict__[item] 
       else:
           return self.contained.__getattr__(item) # redirection

Background:

I am trying to build a class (Indicator) that adds to the functionality of an existing class (pandas.DataFrame). Indicator will have all the methods of DataFrame. I could use inheritance, but I am following the "favor composition over inheritance" advice (see, e.g., the answers in: python: inheriting or composition). One reason not to inherit is because the base class is not serializable and I need to serialize.

I have found this, but I am not sure if it fits my needs.

Community
  • 1
  • 1
Yariv
  • 12,945
  • 19
  • 54
  • 75
  • 2
    And how is a proxy object going to help your serialisation - you'll still have to do that somehow... Just inherit from the base (because your object "is-a") and work from there... – Jon Clements Nov 19 '12 at 19:45
  • 4
    A `pandas.DataFrame` has a lot of methods which return another `DataFrame`. It may be hard to arrange for your `Container` to return another `Container`... – unutbu Nov 19 '12 at 19:47
  • @Jon the base class is not serializalbe, but picklable. It is easier to extend the pickling of a component than that of a super class. – Yariv Nov 19 '12 at 19:57
  • @unutbu, good point. I guess I could wrap every returned `DataFrame`. – Yariv Nov 19 '12 at 20:00

2 Answers2

24

Caveats:

  • DataFrames have a lot of attributes. If a DataFrame attribute is a number, you probably just want to return that number. But if the DataFrame attribute is DataFrame you probably want to return a Container. What should we do if the DataFrame attribute is a Series or a descriptor? To implement Container.__getattr__ properly, you really have to write unit tests for each and every attribute.
  • Unit testing is also needed for __getitem__.
  • You'll also have to define and unit test __setattr__ and __setitem__, __iter__, __len__, etc.
  • Pickling is a form of serialization, so if DataFrames are picklable, I'm not sure how Containers really help with serialization.

Some comments:

  • __getattr__ is only called if the attribute is not in self.__dict__. So you do not need if item in self.__dict__ in your __getattr__.

  • self.contained.__getattr__(item) calls self.contained's __getattr__ method directly. That is usually not what you want to do, because it circumvents the whole Python attribute lookup mechanism. For example, it ignores the possibility that the attribute could be in self.contained.__dict__, or in the __dict__ of one of the bases of self.contained.__class__ or if item refers to a descriptor. Instead use getattr(self.contained, item).


import pandas
import numpy as np

def tocontainer(func):
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        return Container(result)
    return wrapper

class Container(object):
   def __init__(self, df):
       self.contained = df
   def __getitem__(self, item):
       result = self.contained[item]
       if isinstance(result, type(self.contained)):
           result = Container(result)
       return result
   def __getattr__(self, item):
       result = getattr(self.contained, item)
       if callable(result):
           result = tocontainer(result)
       return result
   def __repr__(self):
       return repr(self.contained)

Here is some random code to test if -- at least superficially -- Container delegates to DataFrames properly and returns Containers:

df = pandas.DataFrame(
    [(1, 2), (1, 3), (1, 4), (2, 1),(2,2,)], columns=['col1', 'col2'])
df = Container(df)
df['col1'][3] = 0
print(df)
#    col1  col2
# 0     1     2
# 1     1     3
# 2     1     4
# 3     2     1
# 4     2     2
gp = df.groupby('col1').aggregate(np.count_nonzero)
print(gp)
#       col2
# col1      
# 1        3
# 2        2
print(type(gp))
# <class '__main__.Container'>

print(type(gp[gp.col2 > 2]))
# <class '__main__.Container'>

tf = gp[gp.col2 > 2].reset_index()
print(type(tf))
# <class '__main__.Container'>

result = df[df.col1 == tf.col1]
print(type(result))
# <class '__main__.Container'>
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Could you also explain why this is preferable (or not) over inheritance? – Yariv Nov 19 '12 at 21:32
  • This is very helpful. I couldn't find a good discussion of this elsewhere. – Yariv Nov 19 '12 at 21:41
  • @user1579844: If class B *extends* class A by adding new methods and does not override class A's methods, then use inheritance. Use inheritance if `B` can be substituted wherever `A` is used. Use composition for dependency injection (to allow class `B` to have a class `A2` instead of a class `A`). For this and other ideas, see [this SO answer](http://stackoverflow.com/a/53354/190597). And sometimes neither composition or inheritance is right -- sometimes a simple function is best. (I rather wonder if that is the case here.) – unutbu Nov 19 '12 at 23:38
  • @unutbu, if I go for the simple function solution, how do I customize myindicator.plot(), for example? – Yariv Nov 20 '12 at 08:06
  • @user1579844: You would define `def myplot(dataframe): ...` as a module-level function. – unutbu Nov 20 '12 at 10:13
  • 1
    This also doesn't extend to double underscore methods like `__add__` and so on – mloning Sep 09 '20 at 11:10
1

I found unbutbu 's answer very useful for my own application, I ran into issues displaying it properly in a jupyter notebook. I found that adding the following methods to the class solved the issue.

def _repr_html_(self):
    return self.contained._repr_html_()

def _repr_latex_(self):
    return self.contained._repr_latex_()
Waylon Walker
  • 543
  • 3
  • 10