0

The add and mul definitions here are nonsensical because of their dependence on returning self, causing infinite loops. If they create a new distribution using the lambdas then it works fine, as in my own answer below.

I'm just playing around with classes and overriding trying to build a small statistics tool. However, when I run this code I get stuck in a recursion loop inside the __mul__ call which is being run in the n1.pdf call and I cannot figure out why. I think it has something to do with Python lazily executing the __mul__ instead of doing what I kind of 'wanted' (let's say in the language of CS) which was to create a new pointer to the old function call for pdf that is owned by the new pointer to pdf, and then to set the old pointer (the main .pdf pointer) to the new function.

I think this is quite poorly worded so edits extremely welcome if you understand what I'm asking.

import math
import random

class Distribution:
    def __init__(self, pdf, cdf):
        self.pdf = pdf
        self.cdf = cdf

    def pdf(self, x):
        return self.pdf(x)
        
    def cdf(self, x):
        return self.cdf(x)

    def __mul__(self, other):
        if isinstance(other, float) or isinstance(other, int):
            newpdf = lambda x : self.pdf(x) * other
            self.pdf = newpdf
            newcdf = lambda x : self.cdf(x) * other
            self.cdf = newcdf
            return self
        else:
            return NotImplemented

    def __add__(self, other):
        self.pdf = lambda x : self.pdf(x) + other.pdf(x)
        self.cdf = lambda x : self.cdf(x) + other.cdf(x)
        return Distribution(self.pdf, self.cdf)
    
class Normal(Distribution):
    def __init__(self, mean, stdev):
        self.mean = mean
        self.stdev = stdev

    def pdf(self, x):
        return (1.0 / math.sqrt(2 * math.pi * self.stdev ** 2)) * math.exp(-0.5 * (x - self.mean) ** 2 / self.stdev ** 2)

    def cdf(self, x):
        return (1 + math.erf((x - self.mean) / math.sqrt(2) / self.stdev)) / 2

    def sample(self):
        return self.mean + self.stdev * math.sqrt(2) * math.cos(2 * math.pi * random.random())

if __name__ == "__main__":
    n1 = Normal(1,2)
    n1half = n1 * 0.5
    x = n1.pdf(1)
    print(x)

p.s. I know that it is no longer a pdf after being multiplied by 0.5, this is not an issue.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Milan
  • 344
  • 1
  • 10
  • Side note: function `__mul__` doesn't return a value in every flow-path. –  Dec 27 '21 at 01:06
  • Side note 2: by returning `self`, function `__mul__` doesn't behave like a binary function (in contrast with function `__add__`, for example). –  Dec 27 '21 at 01:09
  • To my understanding, neither one of these functions should change `self`, but rather just return a new object. –  Dec 27 '21 at 01:12
  • 1
    @TomKarzes the goal is that multiplying two `Distributions` creates a `Distribution` that represents the product, by specifying the rule needed to evaluate the PDF and CDF at a given point. The concept being modelled is a [probability distribution](https://en.wikipedia.org/wiki/Probability_distribution), and it makes perfect sense - just the code doesn't correctly express the idea. – Karl Knechtel Dec 27 '21 at 01:36

3 Answers3

2
class Distribution:
    ...
    def pdf(self, x):
        return self.pdf(x)

pdf() calls itself, which calls itself, which calls itself... infinitely.

Same with cdf().

John Gordon
  • 29,573
  • 7
  • 33
  • 58
  • 1
    The assignments to `self.pdf` and `self.cdf` have the same problem. For example, `self.pdf = lambda x : self.pdf(x) + other.pdf(x)` assigns to `self.pdf` a function which infinitely recurses. – Tom Karzes Dec 27 '21 at 01:11
  • Very true, this was a stupid definition whoops. PDF and CDF in the top-level Distribution themselves should be functions being passed to create the Distribution if done this way. I tried first with abstract but there was a circular def'n I needed. – Milan Dec 27 '21 at 01:12
0

Thanks for the help @John and @Tom and @bbbbbb... The problem was trying to return self instead of creating a new distribution. If I change the def'n of mul to

def __mul__(self, other):
        if isinstance(other, float) or isinstance(other, int):
            def newpdf(x):
                return self.pdf(x) * other
            def newcdf(x):
                return self.cdf(x) * other
            return Distribution(newpdf, newcdf)
        else:
            return NotImplemented

Then this solves this problem

Milan
  • 344
  • 1
  • 10
0
def pdf(self, x):
    return self.pdf(x)
    
def cdf(self, x):
    return self.cdf(x)

I assume your intent is to delegate to the attributes. Since they are always assigned, they will be found (assuming you do the lookup on an instance) instead of the class methods (which would straightforwardly be infinite recursion without those attributes); but this in turn means that these class methods are just useless. x.cdf(y), where cdf is a callable instance attribute, just works; there is no need to provide a method as well.

newpdf = lambda x : self.pdf(x) * other
self.pdf = newpdf

I assume your intent is to create a new function that relies upon the existing value of self.pdf. Unfortunately, it doesn't work that way. The problem is that the lambda is late binding. When it executes, that is the time at which it will look up self.pdf... and find itself.

There is a separate problem here, in that you are writing __mul__ and __add__ implementations - that is, the * and + operators, which are supposed to return a new value, and not mutate either operand. (If you wrote a = 3 and b = 4 and then c = a * b, you would be extremely surprised if the values of a or b changed, yes?)

We can solve both problems at once, by simply using the computed pdf and cdf to create a new instance (which we need anyway):

def __mul__(self, other):
    if isinstance(other, float) or isinstance(other, int):
        newpdf = lambda x : self.pdf(x) * other
        newcdf = lambda x : self.cdf(x) * other
        return Distribution(newpdf, newcdf)
    else:
        return NotImplemented

Similarly, __add__ should use local variables, rather than modifying self:

def __add__(self, other):
    newpdf = lambda x : self.pdf(x) + other.pdf(x)
    newcdf = lambda x : self.cdf(x) + other.cdf(x)
    return Distribution(newpdf, newcdf)

Note that implementing these methods also gives you the augmented assignment operators *= and += (albeit by creating a new object and rebinding the name).

Let's test it:

if __name__ == "__main__":
    n1 = Normal(1,2)
    n1half = n1 * 0.5
    print(n1.pdf(1))
    print(n1half.pdf(1))
    n1 += n1 
    print(n1.pdf(1))

I get:

>py test.py
0.19947114020071635
0.09973557010035818
0.3989422804014327
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • You can, of course, use a nested function, as you did when you solved the problem yourself, rather than naming a `lambda`. Or you can pass the `lambda`s directly to the constructor. – Karl Knechtel Dec 27 '21 at 01:37
  • Thanks so much for the detailed answer, and for addressing both problems (the first of which was incredibly stupid for me to miss, and was a result of not thinking clearly enough about a legacy implementation). I'd never heard of Python having late-binding closures before. By the way, you read my intention perfectly. Thank you very much! – Milan Dec 27 '21 at 03:03
  • Reference link (also edited into the question): https://stackoverflow.com/questions/2295290/what-do-lambda-function-closures-capture – Karl Knechtel Dec 27 '21 at 03:31