Why does python forbid the use of sum with strings?

Question

Technically, in Python, one would expect that that anything which has the __add__ method implemented should be able to use the sum function.

Strings have __add__ implemented as concatenation:

"abc "+"def"
> 'abc def'

sum takes a start value as second argument which by default is the int 0 as can be seen by:

sum([])

0

sum([], 7)

7

That is why doing sum(["abc", "def"]) doesn't work, because it tries to add 0 to "abc":

sum(["abc", "def"])

TypeError: unsupported operand type(s) for +: 'int' and 'str'

But giving a string as the start value of sum, it should work. But it doesn't work, because in the implementation of sum, there is an ad hoc check to see if the start value is a string (or other unwanted types) and it raises a type error if the start value is a string:

sum(["sf", "34", "342"], "")

TypeError: sum() can't sum strings [use ''.join(seq) instead]

Why does Python go through the trouble of implementing hand-crafted exceptions on certain types in sum? The simpler, more pythonic way would be for sum to work with anything that has an implementation of __add__, no?

One can see that without this check sum would work with strings by defining a "zero" class which when added to anything returns the other thing. If we use that as the starting element, then it bypasses the check (because it is not a string) and sum can be used to add strings:

sum(["a", "b", "c"], type("_", (), {"__add__": lambda _, o: o})())

'abc'

If there is to be only one obvious way to do this, then I'm happy with `''.join(["sf", "34", "342"])`, since string concatenation is better described as _joining_ than _summing_. — xnx, Apr 03 '18 at 12:54
*"There should be one and preferably only one obvious way to do it."* - If `sum` was interchangeable with `str.join` there'd be *two* ways to do it… — deceze, Apr 03 '18 at 12:54
one can argue about why is this allowed for `list`, though, as this is clearly the same counter-performance issue. — Jean-François Fabre, Apr 03 '18 at 12:56
the main issue is the quadratic effect. `sum` has to do `result = result + x` for each `x`, which takes longer and longer when `result` increases (same issue with lists...). `sum` cannot use the starting argument to accumulate, even for lists — Jean-François Fabre, Apr 03 '18 at 12:57
@Jean-FrançoisFabre thank you for this information. And yes your point about the lists is a good one. — patapouf_ai, Apr 03 '18 at 12:58
related (but about lists): https://stackoverflow.com/questions/42593904/could-sum-be-faster-on-lists?s=1|54.8272 I've been there :) — Jean-François Fabre, Apr 03 '18 at 12:59
@Jean-FrançoisFabre I whish lists offered the syntax `[].join([lst1, lst2, ...])` instead of `list(itertools.chain(lst1, lst2, ...))` (and why not even tuples), I end up summing them for simplicity in many cases... — jdehesa, Apr 03 '18 at 13:03
@jdehesa that's very true. a pity to have to use `itertools` to get an acceptable speed. There's also the double flat listcomp for this: `[z for y in x for z in y]` — Jean-François Fabre, Apr 03 '18 at 13:04

Why does python forbid the use of sum with strings?

0 Answers0

Linked