5

I would like to make string comparison case insensitive. For that, I would like to create an immutable class with just one string field. In the constructor, I would like to call lower() before assigning the value to the field.

I would like to use as much as possible of standard classes like namedtuple or dataclass. Using the __post_init__ function (see e.g. How to use the __post_init__ method in Dataclasses in Python) feels like a hack. It also makes me wonder is the field still frozen after I changed it in the __post_init__ function.

However, I can't find a __pre_init__ function. Is there a better way?

Elazar
  • 20,415
  • 4
  • 46
  • 67
  • 2
    " I would like to call lower() before assigning the value to the field" -- So, is something stopping you from doing that? If so, pls elaborate. If not, what else is the problem? – fountainhead Nov 04 '20 at 14:07
  • 1
    Why wouldn't a .lower() in your regular init not be sufficient? Just a hobbyist here, so perhaps i lack some SE knowledge... – Kraay89 Nov 04 '20 at 14:08
  • 1
    A `pre_init` - a mthod that is called before `__init__` is `__new__`. Look here for more https://spyhce.com/blog/understanding-new-and-init – balderman Nov 04 '20 at 14:11
  • __new__ is the initialization of the class, __init__ the initialization of the instance. The conversion to lower should happen for all instances. – Pierre van de Laar Nov 04 '20 at 14:35
  • For example, overwriting __init__ of a NamedTuple as ```` class Name(NamedTuple): name: str def __init__(self, name: str) -> None: def canonical_representation() -> str: return name.lower() self.name: str = canonical_representation() ``` is not allowed: ``` File "C:\Users\laarpjljvd\AppData\Local\Programs\Python\Python39\lib\typing.py", line 1775, in __new__ raise AttributeError("Cannot overwrite NamedTuple attribute " + key) ``` – Pierre van de Laar Nov 04 '20 at 14:39
  • I am looking for readable and minimal declarations: something like ``` @dataclass(frozen=True) class Name: name: str = field.lower() ``` such that I don't have to write all good functions for repr, hash, equality, etc, etc, etc... – Pierre van de Laar Nov 04 '20 at 14:51
  • @PierrevandeLaar, I'm not 100% sure what you're asking, but I took a guess at it with the answer I've left. If that is not what you're after, can you please update your question to clarify? Thanks! – Mack Jan 29 '21 at 17:53

5 Answers5

5

Turns out that dataclasses doesn't provide the functionality I was looking for. Attrs however does:

from attr import attrs, attrib


@attrs(frozen=True)
class Name:
    name: str = attrib(converter=str.lower)
3

Clarification: If I'm interpreting the question correctly, you specifically want to work with a class annotated with the stdlib dataclasses.dataclass(frozen=True). Specifically, for your case, you might have something like this:

from dataclasses import dataclass


@dataclass(frozen=True)
class HasStringField:
    some_string: str


instance = HasStringField("SOME_STRING")
instance.some_string  # value: "SOME_STRING"

My understanding is that you want to implement something such that the value in instance.some_string above is actually "some_string".


Answer: There is no "built-in" way to do what you want, but there are two options that come to mind: using object.__setattr__ in __post_init__ or else using an alternative decorator that provides you with a way to create a custom constructor that still calls through to the constructor created by dataclass.

Option 1: using object.__setattr__ in __post_init__

@dataclasses.dataclass(frozen=True)
class HasStringField:
    some_string: str

    def __post_init__(self):
        object.__setattr__(self, "some_string", self.some_string.lower())


inst = HasStringField("SOME_STRING")
inst.some_string  # value: "some_string"

Option 2: custom decorator

The dataclass_with_default_init decorator in an answer to a similar question provides one such decorator.

Using the dataclass_with_default_init decorator, you'd do:

# Import / implement the decorator

@dataclass_with_default_init(frozen=True)
class HasStringField:
    some_string: str

    def __init__(self, some_string: str):
        self.__default_init__(some_string=some_string.lower())


instance = HasStringField("SOME_STRING")
instance.some_string  # value: "some_string"

...and everything else about this class would remain as if you had used the dataclass decorator instead.

Mack
  • 2,614
  • 2
  • 21
  • 33
0

This should be one of the simplest ways, the field you mention is the internal data (inherited from the UserString class):

from collections import UserString

class LowerStr(UserString):
    def __init__(self, value):
        super().__init__(value)
        self.data = self.data.lower()

s1 = LowerStr("ABC")
print("s1: {}".format(s1))
print("s1 class: {}".format(s1.__class__))
print("s1 class mro: {}".format(s1.__class__.mro()))

Output:

s1: abc
s1 class: <class '__main__.LowerStr'>
s1 class mro: [<class '__main__.LowerStr'>, <class 'collections.UserString'>, <class 'collections.abc.Sequence'>, <class 'collections.abc.Reversible'>, <class 'collections.abc.Collection'>, <class 'collections.abc.Sized'>, <class 'collections.abc.Iterable'>, <class 'collections.abc.Container'>, <class 'object'>]

This gives you all the methods from str and you can customize them at will.

If you prefer to subclass, so there is no internal data attribute:

class InsensitiveStr(str):
    def __new__(cls, value):
        if isinstance(value, str):
            return super().__new__(cls, value.lower())
        else:
            raise ValueError

s1 = InsensitiveStr("ABC")
print(s1)
print(type(s1))
print(s1.__class__)
print(s1.__class__.__mro__)

Note the differences in mro.

Outputs

abc
<class '__main__.InsensitiveStr'>
<class '__main__.InsensitiveStr'>
(<class '__main__.InsensitiveStr'>, <class 'str'>, <class 'object'>)
progmatico
  • 4,714
  • 1
  • 16
  • 27
0

Note: I still think the best approach is to use the __post_init__ / object.__set_attribute__ mentioned above or to use an external library like https://www.attrs.org/en/stable/)

One pattern I have seen reading the python code for dataclasses itself is to use a function with the same name but in snake case to do these sorts of changes. e.g.:

@dataclass(frozen=True)
class HasStringField:
  some_string: str


def has_string_field(some_string: str) -> HasStringField:
  return HasStringField(some_string.lower())

Then you always import and use has_string_field() rather than HasStringField(). (An example of this pattern is dataclasses field vs Field):

It has the drawback of being verbose and making the class invariant's bypassable.

tofarr
  • 7,682
  • 5
  • 22
  • 30
0

Python's preinit method is __new__ :

You can't set attributes on a frozen dataclass, so I use a namedtuple here.

from collections import namedtuple

_WithString = namedtuple("WithString", ["stringattr"])


class WithString:
    def __new__(cls, input_string):
        return _WithString(input_string.lower())


string_1 = WithString("TEST")
string_2 = WithString("test")
print("instances:", string_1, string_2)
print("attributes:", string_1.stringattr, string_2.stringattr)
print("compare instances:", string_1 == string_2)
print("compare attributes:", string_1.stringattr == string_2.stringattr)

Output:

instances: WithString(stringattr='test') WithString(stringattr='test')
compare instances: True

You can do this in __init__ too:

Equivalent behavior but this time with a dataclass. You'll have to bypass the frozen attribute:

from dataclasses import dataclass


@dataclass(frozen=True)
class WithString:
    stringattr: str

    def __init__(self, input_string):
        object.__setattr__(self, "stringattr", input_string.lower())


string_1 = WithString("TEST")
string_2 = WithString("test")
print("instances:", string_1, string_2)
print("compare instances:", string_1 == string_2)

Output:

instances: WithString(stringattr='test') WithString(stringattr='test')
compare instances: True

Or just override str

The resulting object behaves a lot like a string and can also be compared to strings.

class CaseInsensitiveString(str):
    def __eq__(self, other):
        try:
            return self.lower() == other.lower()
        except AttributeError:
            return NotImplemented


string_1 = CaseInsensitiveString("TEST")
string_2 = CaseInsensitiveString("test")
actual_string = "Test"
print("instances:", string_1, string_2)
print("compare instances:", string_1 == string_2)
print("compare left:", string_1 == actual_string)
print("compare right:", actual_string == string_1)

Output:

instances: TEST test
compare instances: True
compare left: True
compare right: True
Joooeey
  • 3,394
  • 1
  • 35
  • 49