232

Long story short

PEP-557 introduced data classes into Python standard library, that basically can fill the same role as collections.namedtuple and typing.NamedTuple. And now I'm wondering how to separate the use cases in which namedtuple is still a better solution.

Data classes advantages over NamedTuple

Of course, all the credit goes to dataclass if we need:

  • mutable objects
  • inheritance support
  • property decorators, manageable attributes
  • generated method definitions out of the box or customizable method definitions

Data classes advantages are briefly explained in the same PEP: Why not just use namedtuple.

Q: In which cases namedtuple is still a better choice?

But how about an opposite question for namedtuples: why not just use dataclass? I guess probably namedtuple is better from the performance standpoint but found no confirmation on that yet.

Example

Let's consider the following situation:

We are going to store pages dimensions in a small container with statically defined fields, type hinting and named access. No further hashing, comparing and so on are needed.

NamedTuple approach:

from typing import NamedTuple

PageDimensions = NamedTuple("PageDimensions", [('width', int), ('height', int)])

DataClass approach:

from dataclasses import dataclass

@dataclass
class PageDimensions:
    width: int
    height: int

Which solution is preferable and why?

P.S. the question isn't a duplicate of that one in any way, because here I'm asking about the cases in which namedtuple is better, not about the difference (I've checked docs and sources before asking)

Oleh Rybalchenko
  • 6,998
  • 3
  • 22
  • 36
  • 2
    I've seen that question, but there are no answer about the main point: in which cases namedtuples are still better to use? – Oleh Rybalchenko Aug 03 '18 at 12:45
  • See also https://stackoverflow.com/questions/3357581/using-python-class-as-a-data-container – pylang Dec 22 '18 at 02:36
  • 1
    Note that using a list of `NamedTuple`s as an input for `np.array` will "just work" because (as mentioned in the accepted answer) `NamedTuple` inherits from `tuple`. Numpy does not handle dataclasses as smoothly (treating them as having dtype `object`). – Jasha Jul 27 '21 at 22:56
  • TLDR for beginners: choose data classes. – OrenIshShalom Jul 11 '22 at 11:43
  • 1
    It's worth noting that [NamedTuples have issues with subclassing](https://github.com/python/typing/issues/427). – Stevoisiak Feb 01 '23 at 15:39

7 Answers7

171

It depends on your needs. Each of them has own benefits.

Here is a good explanation of Dataclasses on PyCon 2018 Raymond Hettinger - Dataclasses: The code generator to end all code generators

In Dataclass all implementation is written in Python, whereas in NamedTuple, all of these behaviors come for free because NamedTuple inherits from tuple. And because the tuple structure is written in C, standard methods are faster in NamedTuple (hash, comparing and etc).

Note also that Dataclass is based on dict whereas NamedTuple is based on tuple. Thus, you have advantages and disadvantages of using these structures. For example, space usage is less with a NamedTuple, but time access is faster with a Dataclass.

Please, see my experiment:

In [33]: a = PageDimensionsDC(width=10, height=10)

In [34]: sys.getsizeof(a) + sys.getsizeof(vars(a))
Out[34]: 168

In [35]: %timeit a.width
43.2 ns ± 1.05 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [36]: a = PageDimensionsNT(width=10, height=10)

In [37]: sys.getsizeof(a)
Out[37]: 64

In [38]: %timeit a.width
63.6 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

But with increasing the number of attributes of NamedTuple access time remains the same small, because for each attribute it creates a property with the name of the attribute. For example, for our case the part of the namespace of the new class will look like:

from operator import itemgetter

class_namespace = {
...
    'width': property(itemgetter(0, doc="Alias for field number 0")),
    'height': property(itemgetter(0, doc="Alias for field number 1"))**
}

In which cases namedtuple is still a better choice?

When your data structure needs to/can be immutable, hashable, iterable, unpackable, comparable then you can use NamedTuple. If you need something more complicated, for example, a possibility of inheritance for your data structure then use Dataclass.

huyz
  • 2,297
  • 3
  • 25
  • 34
Oleksandr Yarushevskyi
  • 2,789
  • 2
  • 17
  • 24
  • 8
    I agree with this answer. In my case I use a NamedTuple via typing if possible because it can be unpacked and spread. However there are many cases when I need a dataclass, commonly because of inheritance or custom init. – Howard Lovatt Jun 02 '20 at 00:36
  • 1
    i find it interesting that both `dataclasses.Dataclass` and `collections.namedtuple` are both just code generators. facinating. in the case of `collections.namedtuple` it has a huge template string literal that gets `exec`. i thought they were going to create all of this programmatically somehow. but code generation then exec makes sense. – Trevor Boyd Smith Nov 03 '20 at 20:50
  • 39
    FWIW, dataclasses can also be immutable, hashable, iterable and comparable. The `dataclass()` decorator accepts a kwarg `frozen=True`. – ffledgling Apr 18 '21 at 00:49
  • In dataclasses you can specify the types of the attributes. – VaNa Sep 07 '21 at 14:46
  • 3
    why dict is less time than tuple at accessing attributes ? – WeiChing 林煒清 May 22 '22 at 04:51
  • 1
    @VaNa in `typing.NamedTuple` you can (and must) specify the types of the attributes. – Jacktose Oct 05 '22 at 16:43
  • The NamedTuple has to do a dict fetch on the class to find the property then a tuple lookup. – DylanYoung Oct 14 '22 at 02:12
40

In programming in general, anything that CAN be immutable SHOULD be immutable. We gain two things:

  1. Easier to read the program- we don't need to worry about values changing, once it's instantiated, it'll never change (namedtuple)
  2. Less chance for weird bugs

That's why, if the data is immutable, you should use a named tuple instead of a dataclass

I wrote it in the comment, but I'll mention it here: You're definitely right that there is an overlap, especially with frozen=True in dataclasses- but there are still features such as unpacking belonging to namedtuples, and it always being immutable- I doubt they'll remove namedtuples as such

maor10
  • 1,645
  • 18
  • 28
  • 52
    Why not a dataclass with `@dataclass(frozen=True)`? –  Aug 03 '18 at 12:09
  • 12
    Another advantage is unpacking in namedtuples- e.g. if I have a Point(x, y), I can unpack it `x, y = point`- – maor10 Aug 03 '18 at 12:09
  • 2
    I want to make it clear though that you're right in a sense- namedtuples were created before python3, and there's obviously a bit of an overlap here. But because it's not an exact replacement (unpacking, namedtuples always being immutable), they probably won't remove namedtuples – maor10 Aug 03 '18 at 12:12
  • 4
    @maor10 thanks for the answer, unpacking is really the only advantage I see yet. As mentioned above, the dataclass can be immutable. – Oleh Rybalchenko Aug 03 '18 at 12:59
  • 1
    I think you may rewrite the answer a little in order to make it clear for the others and accept it. It seems that immutability itself is NOT the thing here, mainly it's about unpacking. – Oleh Rybalchenko Aug 03 '18 at 13:06
36

I had this same question, so ran a few tests and documented them here: https://shayallenhill.com/python-struct-options/

Summary:

  • NamedTuple is better for unpacking, exploding, and size.
  • DataClass is faster and more flexible.
  • The differences aren't tremendous, and I wouldn't refactor stable code to move from one to another.
  • NamedTuple is also great for soft typing when you'd like to be able to pass a tuple instead.

To do this, define a type inheriting from it...

from typing import NamedTuple

class CircleArg(NamedTuple):
    x: float
    y: float
    radius: float

...then unpack it inside your functions. Don't use the .attributes, and you'll have a nice "type hint" without any PITA for the caller.

*focus, radius = circle_arg_instance  # or tuple
Shay
  • 1,368
  • 11
  • 17
  • 28
    "I wouldn't refactor stable code to move from one to another." - A Wise Developer – rodrigo-silveira Jul 02 '21 at 07:22
  • what is the purpose of the `or tuple` parameter syntax ? – WestCoastProjects Oct 29 '21 at 12:33
  • @WestCoastProjects, just poor formatting on my part. Updated now. The line was just trying to get across that you can enter either a) an instance of the CircleArg class or b) a plain 3-tuple ... on the right-hand side of the =. – Shay Oct 31 '21 at 15:01
29

I didn't see any of the other answers mention it, but in my opinion one of the most important differences is to do with how equality and comparison work. When you compare named tuples, the names are ignored: two named tuples are equal if they contain the same values in the same order, even if they have different class names or field names:

>>> from collections import namedtuple
>>> A = namedtuple('A', ())
>>> B = namedtuple('B', ())
>>> a = A()
>>> b = B()
>>> a == b
True

Dataclasse instances, on the other hand, will only be considered equal if they are of the same type. I pretty much always want the latter behaviour: I expect things of different types to be distinct.

Andrew Foote
  • 391
  • 3
  • 3
12

Another important limitation to NamedTuple is that it cannot be generic:

import typing as t
T=t.TypeVar('T')
class C(t.Generic[T], t.NamedTuple): ...

TypeError: Multiple inheritance with NamedTuple is not supported
KFL
  • 17,162
  • 17
  • 65
  • 89
  • 2
    This is actually a bug. It should be [fixed in Python 3.11](https://github.com/python/cpython/pull/92027). – rmorshea Oct 31 '22 at 22:57
8

One usecase for me is frameworks that do not support dataclasses. In particular, TensorFlow. There, a tf.function can work with a typing.NamedTuple but not with a dataclass.

class MyFancyData(typing.NamedTuple):
  some_tensor: tf.Tensor
  some_other_stuf: ...

@tf.function
def train_step(self, my_fancy_data: MyFancyData):
    ...
fabian789
  • 8,348
  • 4
  • 45
  • 91
0

There is another small difference between them not mentioned so far. The attributes of named tuples can be accessed by their names and indexes, while the attributes of data classes only by their attribute names. I ran into this difference when sorting list of objects.

For named tuples, we can use both the itemgetter and attrgetter helper functions. For data classes, we can use only the attrgetter function.

#!/usr/bin/python

from typing import NamedTuple
from operator import itemgetter, attrgetter
# from dataclasses import dataclass

# @dataclass(frozen=True)
# class City:
#     cid: int
#     name: str
#     population: int

class City(NamedTuple):
    cid: int
    name: str
    population: int

c1 = City(1, 'Bratislava', 432000)
c2 = City(2, 'Budapest', 1759000)
c3 = City(3, 'Prague', 1280000)
c4 = City(4, 'Warsaw', 1748000)
c5 = City(5, 'Los Angeles', 3971000)
c6 = City(6, 'Edinburgh', 464000)
c7 = City(7, 'Berlin', 3671000)

cities = [c1, c2, c3, c4, c5, c6, c7]

sorted_cities = sorted(cities, key=attrgetter('name'))

for city in sorted_cities:
    print(city)

print('---------------------')

sorted_cities = sorted(cities, key=itemgetter(2))

for city in sorted_cities:
    print(city)
Jan Bodnar
  • 10,969
  • 6
  • 68
  • 77