1

I have a Django Model that tracks email message IDs as the message passes through different servers. It has a models.ForeignKey field to itself, so we can chain them together to follow the full path of a message through many servers.

The model has a 'get_children()' method that recursively looks for all messages descended from the given message.

class Relay(models.Model):
    hostname = models.CharField(max_length=48)
    qid = models.CharField(max_length=24)
    received_from = models.ForeignKey(
        "self",
        blank=True,
        null=True,
        default=None,
        on_delete=models.SET_DEFAULT
    )
    def get_children(self, parent_messages=set()):
        for r in Relay.objects.filter(received_from=self):
            r.get_children(parent_messages)
        parent_messages.add(self)
        p = list(parent_messages)
        parent_messages = set()
        return p

If I run a single query from the Django shell, it works as expected. "Solo" messages return only themselves. Messages with multiple child/descendant messages are found as well.

    >>> r = Relay.objects.get(qid='xo2')
    >>> r.get_children()
    [<Relay: server2 foo>, <Relay: server3 bbb>, <Relay: server1 xo2>]

If I kill and restart the shell, the next query works as expected, fetching a single message

    >>> r = Relay.objects.get(qid='singletonMsg')
    >>> r.get_children()
    [<Relay: server5 singletonMsg>]

But if I run get_children() repeatedly within a single Django shell session, it always includes the results of the previous get_children() calls.


    >>> r = Relay.objects.get(qid='singletonMsg')
    >>> r.get_children()
    # expected result
    [<Relay: server5 singletonMsg>]  
    >>>
    >>> r = Relay.objects.get(qid='xo2')
    >>> r.get_children()
    # unexpected result - should not include singletonMsg 
    [<Relay: server2 foo>, <Relay: server3 bbb>, <Relay: server5 singletonMsg>, <Relay: server1 xo2>]
    >>>
    >>> r = Relay.objects.get(qid='singletonMsg')
    >>> r.get_children()
    # unexected result - should just return singletonMsg ??
    [<Relay: server2 foo>, <Relay: server3 bbb>, <Relay: server5 singletonMsg>, <Relay: server1 xo2>]

I was originally returning the "parent_messages" set from the function. I tried return [m for m in parent_messages] and the current list() thinking it was a closure issue, but no luck. I am stumped. Thanks in advance for any advice.

yolabingo
  • 118
  • 1
  • 6

1 Answers1

1

When you write the code:

def get_children(self, parent_messages=set()):
    ...

You are defining an set object in memory called parent_messages. Since set objects are mutable, each reference to parent_messages actually references the same object, and each change made to parent_messages changes the same object in memory. When you make an additional get_children call, you are still referring to the original parent_messages object that you defined.

See The Mutable Default Argument for a more in-depth explanation of this.

Now, to answer your question, I would strongly recommend using django-mptt (pip install django-mptt). This would massively simplify the code you need to accomplish this:

from mptt.models import MPTTModel

class Relay(MPTTModel):
    hostname = models.CharField(max_length=48)
    qid = models.CharField(max_length=24)
    received_from = models.ForeignKey(
        "self",
        blank=True,
        null=True,
        default=None,
        on_delete=models.SET_DEFAULT
    )

execution:

>>> r = Relay.objects.get(qid='xo2')
>>> r.get_descendants()
[<Relay: server2 foo>, <Relay: server3 bbb>, <Relay: server1 xo2>]

get_descendants gets all children of r, and all of their children, and so on. If you want to only refer to the direct children of r, use get_children instead.

Lord Elrond
  • 13,430
  • 7
  • 40
  • 80