This answer and its comments provide some insight into the inner working's of CPython's str.join()
:
- If the argument is not already a
list
or atuple
, a newlist
is created with the same contents. - The argument is iterated over once, to sum the lengths of the strings it holds.
- Memory is allocated for a new string.
- Finally, the argument is iterated over a second time, and the strings are copied into the memory for the new string.
This seems questionable to me. For starters, why reject all sequence types but two? Wouldn't just iterating over any sequence twice instead of copying it be much faster? And why make a list
, particularly if you can't know the length of the iterable you're making it from? You don't need random access, just repeated iteration, and using a list
means you might have to reallocate and copy several times during its generation. Wouldn't it make more sense to use a linked list or a deque
?
Can anyone provide some insights into these design decisions?