13

I use dictionaries as data structure a lot in my code. Instead of returning several value as Tuple like Python permits it :

def do_smth():
  [...]
  return val1, val2, val3

I prefer to use a dictionary with the advantage to have named keys. But with complex nested dictionary it's hard to navigate inside it. When I was coding with JS several years ago I liked dictionary too because I could call sub part like thing.stuff.foo and the IDE helped me with the structure.

I just discover the new DataClass in python and I'm not sure about the reason of this except to replace a dictionary ? For what I have read a DataClass cannot have function inside and the initialization of its arguments is simplified.

I would like to have comments about this, how do you use a DataClass, or about dictionary in python.

salty-horse
  • 139
  • 9
Ragnar
  • 2,550
  • 6
  • 36
  • 70

3 Answers3

22

Dataclasses are more of a replacement for NamedTuples, then dictionaries.

Whilst NamedTuples are designed to be immutable, dataclasses can offer that functionality by setting frozen=True in the decorator, but provide much more flexibility overall.

If you are into type hints in your Python code, they really come into play.

The other advantage is like you said - complex nested dictionaries. You can define Dataclasses as your types, and represent them within Dataclasses in a clear and concise way.

Consider the following:

@dataclass
class City:
    code: str
    population: int


@dataclass
class Country:
   code: str
   currency: str
   cities: List[City]


@dataclass
class Locations:
   countries: List[Country]
    

You can then write functions where you annotate the function param with dataclass name as a type hint and access it's attributes (similar to passing in a dictionary and accessing it's keys), or alternatively construct the dataclass and output it i.e.

def get_locations(....) -> Locations:
....

It makes the code very readable as opposed a large complicated dictionary.

You can also set defaults, which is not something that is (edit: WAS prior to 3.7) not allowed in NamedTuples but is allowed in dictionaries.

@dataclass
class Stock:
   quantity: int = 0

You can also control whether you want the dataclass to be ordered etc in the decorator just like whether want it to be frozen, whereas normal dictionaries are not ordered (edit: WAS prior to 3.7). See here for more information

You get all the benefits of object comparison if you want them i.e. __eq__() etc. They also by default come with __init__ and __repr__ so you don't have to type out those methods manually like with normal classes.

There is also substantially more control over fields, allowing metadata etc.

And lastly you can convert it into a dictionary at the end by importing from dataclasses import dataclass asdict

Update (Aug 2023): Thanks for the comments! Have edited to clarify those features from 3.7 that I misrepresented. Also wanted to add some further information whilst I'm here:

For what I have read a DataClass cannot have function inside and the initialization of its arguments is simplified.

Just a note... You can bind methods to a dataclass and by default __init__ is constructed for you but I believe this can be disabled using @dataclass(init=False) which will give the ability to construct the object and then modify the attribute (my_var = MyClass(); my_var.my_field = 42. However I have found the __post_init__ method very handy, and there is the ability to suspend a specific attribute from automatically initialising to give more control i.e. from the docs

@dataclass
class C:
    a: float
    b: float
    c: float = field(init=False)

    def __post_init__(self):
        self.c = self.a + self.b

Another useful aspect to the __post_init__ is to make assertions of the value. Type checking on init is performed only to evaluate whether any Class Variables are defined, as they are excluded as fields but can be leveraged by internal methods i.e.

from typing import ClassVar

@dataclass
class Lamp:
    valid_sockets: ClassVar[set] = { 'edison_screw', 'bayonet' }
    valid_min_wattage: ClassVar[int] = 40
    valid_max_wattage: ClassVar[int] = 200
    height_cm: int
    socket: str
    wattage: int
    
    def __post_init__(self) -> None:
        assert self._is_valid_wattage(), f'Lamp requires {self.valid_min_wattage}-{self.valid_max_wattage}W bulb'
        assert self._is_valid_socket(), f'Bulb must be one of {self.valid_sockets}'
        
    def _is_valid_socket(self) -> bool:
        return self.socket.lower() in self.valid_sockets

    def _is_valid_wattage(self) -> bool:
        return (self.wattage > self.valid_min_wattage) and ( self.wattage < self.valid_max_wattage)

In [27]: l = Lamp(50, 'bayonet', 80)
In [28]: print(repr(l))
Lamp(height_cm=50, socket='bayonet', wattage=80)
In [29]: l = Lamp(50, 'bayonet', 300)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In [29], line 1
----> 1 l = Lamp(50, 'bayonet', 300)

File <string>:6, in __init__(self, height_cm, socket, wattage)

Cell In [25], line 11, in Lamp.__post_init__(self)
     10 def __post_init__(self) -> None:
---> 11     assert self._is_valid_wattage(), f'Lamp requires {self.valid_min_wattage}-{self.valid_max_wattage}W bulb'
     12     assert self._is_valid_socket(), f'Bulb must be one of {self.valid_sockets}'

AssertionError: Lamp requires 40-200W bulb
264nm
  • 725
  • 4
  • 13
  • 4
    "whereas normal dictionaries are not ordered" slight correction, since python 3.7 dictionaries are indeed ordered 2 years before your posting: https://gandenberger.org/2018/03/10/ordered-dicts-vs-ordereddict/ – Jonesn11 May 02 '22 at 21:34
  • "You can also set defaults, which is not something that is allowed in `NamedTuples` but is allowed in dictionaries." - Since Python 3.7 there is `default` keyword in `NamedTuples` to define default values. Check - https://stackoverflow.com/questions/11351032/named-tuple-and-default-values-for-optional-keyword-arguments – s.paszko Nov 10 '22 at 09:19
2

My take on it.

A DataClass isn't there to necessarily replace a dictionary. Rather it is used as an object to hold some data where it makes sense in the modeling of an application.

Let's say we are building a simple address book. Assuming it is just storing some data, the Person class can be a dataclass with fields like name, phone_number, etc. We can then use a dictionary to create a lookup of name to Person such that we can retrieve this data class by name.

from dataclasses import dataclass
@dataclass
class Person:
    def __init__(self, name, address, phone_number):
        self.name = name
        self.address = address
        self.phone_number = phone_number

then elsewhere in the app:

persons = <LIST OF PERSONS>
address_book = {person.name: person for person in persons}

It is a rudimentary example but I hope it gets the idea across.

Of course one could argue why to use dataclass when a namedtuple would suffice?

Others have written on that topic:

k88
  • 1,858
  • 2
  • 12
  • 33
1

Go for it, is pure OO is it fine to have pure data classes especially if you are dealing with multi-threading. Still, my advice is to try to insert this information only where is needed and used (mixing the data class with functionalities).

BioShock
  • 763
  • 2
  • 13
  • 33
  • I'm working with data so i'm more into functional programming. I barely use OO (class, inheritance...) but can you tell me more about multi-threading that you pointed out. – Ragnar Feb 04 '20 at 10:50
  • @Ragnar how do you work with data? Are u using pandas? How do you write column names? Is it like df['column']? – cikatomo May 15 '21 at 21:28
  • Yes pandas all the way or PySpark RDD. – Ragnar May 16 '21 at 14:00