2

I am a self-taught programmer, and I've recently been learning python. I have encountered a strange issue but I imagine it is just a result of me not knowing the python syntax and/or program flow.

I have one class called Test which is in the file TestClass.py. `

class Test:

    __tags = {}
    __fields = {}

    def __init__(self, tags: dict={}, fields: dict={}):
        self.__tags = tags
        self.__fields = fields

    def setTag(self, key, value):
        self.__tags[key] = value

    def getTag(self, key):
        return self.__tags[key]

    def setField(self, key, value):
        self.__fields[key] = value

    def getField(self, key):
        return self.__fields[key]


    def getAll(self):
        return [
            {
                'tags': self.__tags,
                'fields': self.__fields
            }
        ]

I am testing out the functionality of this class in a file containing procedural code, test.py

import TestClass

t1 = TestClass.Test()
t1.setTag('test1', 'value1')
t1.setField('testfield', 'fieldvalue')

t2 = TestClass.Test()
t2.setTag('test2', 'value2')

print(t1.getAll())
print(t2.getAll())

The print statements are where things get weird. The output should be:

[{'tags': {'test1': 'value1'}, 'fields': {'testfield': 'fieldvalue'}}]
[{'tags': {'test2': 'value2'}, 'fields': {}}]

But the actual output is...

[{'tags': {'test2': 'value2', 'test1': 'value1'}, 'fields': {'testfield': 'fieldvalue'}}]
[{'tags': {'test2': 'value2', 'test1': 'value1'}, 'fields': {'testfield': 'fieldvalue'}}]

Why though?

Edit: Python 3.5

nwilging
  • 83
  • 2
  • 8

1 Answers1

4

You just fell not in one, but in two Python well known "traps" for newcomers.

This behavior is expected, and to fix it, you should change the beginning of your class declaration to:

from typing import Optional 


class Test:
    def __init__(self, tags: Optional[dict]=None, fields: Optional[dict]=None):
        self.__tags = tags or {}
        self.__fields = fields or {}
        ...
    ...

Now understanding the "why so?":
The Python code - including expressions, present at either module level, or inside a class body, or at a function or method declaration is processed just once - when that module is first loaded.

This means the empty dictionaries you were creating in your class body and on the default parameters of the __init__ level where created as a dictionary at this time, and re-used every time the class was instantiated.

The first part is that attributes declared directly on the class body in Python are class attributes - which mean they will be shared across all instances of that class. If you assign an attribute with self.attribute = XXX inside a method, then you create an instance attribute.

The second problem is that default values for function/method parameters are saved along with the function code - so the dictionaries you declared as empty there were the same after each method call - and shared across all instances of your class.

The usual pattern to avoid this is to set default parameters to None or other sentinel value of choice, and within the function body to test: if no value was sent to those parameters, just create a fresh new dictionary (or other mutable object) instance. This is created when the function is actually executed and is unique for that run. (And, if you assign them to an instance attribute with self.attr = {}, unique to that instance, of course)

As for the or keyword I proposed in my answer self.__tags = tags or {} - it begs from a pattern common in old Python (before we had an inine if) but still useful, in which the "or" operator shortcuts, and in expressiions like obj1 or obj2, returns the first operand if it evaluates to a "truish" value, or returns the second attribute (if it is not truish, does not matter, the truth value of the second parameter is all that matters anyway). The same expression using an inline "if" expression would be: self.__tags = tags if tags else {} .

Also, it is nice to mention that although the pattern of prepending two __ to attribute names in order to have what is mentioned in old tutorials as "private" attributes, that is not a good programing pattern and should be avoided. Python does not actually implements private or protected attribute access - what we do use is a convention that, if a certain attribute, method or function name starts with _ (a single underline), it is meant for private use of whoever coded it there, and changing or calling those might have unexcpted behaviors in future versions of the code which control those attributes - but nothing in the code actually prevents you from doing so.

For a double underscores prefix, however, there is an actuall side effect: at compile time, class attributes prefixed with __ are renamed, and the __xxx is renamed to _<classname>__xxx - all ocurrences within the class body are renamed in the same fashion, and code outside the class body can access it normally, just writing the full mangled name. This feature is meant to allow base classes to hold attributes and methods that are not to be overriden in sub-classes, either by mistake or ease of use of an attribute name, (but not for "security" purposes).

Old language tutorials and texts usually explain this feature as a way to do "private attributes" in Python - those are actually incorrect.

jsbueno
  • 99,910
  • 10
  • 151
  • 209
  • Do you mind me asking how this changes the output, I've not seen the or keyword used like this before? – George Willcox May 03 '17 at 19:06
  • This clears up a few things for me actually. First, I was encountering issues with "'NoneType' object does not support item assignment" (setting `tags: dict=None`). Second, I was encountering the issue described in my question. This clears up both, and the `or` makes a lot of sense! Thank you! Will accept this as the answer in about 8 minutes. – nwilging May 03 '17 at 19:08
  • Three, if you include "things must be private, I'll put double underscores everywhere"! – jonrsharpe May 03 '17 at 19:08
  • @jonrsharpe http://stackoverflow.com/questions/1641219/does-python-have-private-variables-in-classes i have experienced a change of heart now :) – nwilging May 03 '17 at 19:13
  • 1
    The pattern with `or` is *not* common to C-like languages, those will always produce a boolean integer 0 or 1 rather than choosing one of the argument values. It was very handy in Python before the `if/else` expression was introduced, and can still be clearer as this example shows. – Mark Ransom May 03 '17 at 19:33
  • 2
    I realize this is a 5 year old question, but please don't use `or` like this (use `if`/`else`, e.g. `arg if arg is not None else default`). `or` _will_ work fine when you're sure that you're always passing truthy args, but if you're (intentionally) passing an empty dict (e.g. to maintain a reference to it to update outside of the class later (esp. because you're dunder mangling your members here), etc.) -- or a dict that may or may not be empty -- this `or` approach will silently and stealthily ignore that and assign a new one. – jedwards Jul 18 '22 at 04:11
  • excuse me - "or like this" is perfectly valid in the examples you mention - and is indeed incorrectly used in this example, just as you note. I am fixing it. – jsbueno Jul 18 '22 at 14:29