1

Consider the following code:

class A(object):
    def __init__(self):
        self.a = '123'

    def __len__(self):
        print('len')
        return 2

    def __getitem__(self, pos):
        print('get pos', pos)
        return self.a[pos]

a = A()
print(''.join(a))

My expected output:

> len
> get pos 0
> get pos 1
> 12

The real output:

> len
> get pos 0
> get pos 1
> get pos 2
> get pos 3
> 123

Try it your self. I cannot believe what happens here.

As I understand the behavior correctly, str.join() calls __len__ but ignores the value and calls __getItem__ until the index out of range exception.

I must overlook something because the implementation of join seems different:

https://github.com/python/cpython/blob/3.6/Objects/stringlib/join.h

My current workaround is:

def __getitem__(self, pos):
    if pos >= len(self):
      raise IndexError()
return self.a[pos]

This is ridiculous.

I tested it under with Python 3.6 and 3.7 (CPython).

Viatorus
  • 1,804
  • 1
  • 18
  • 41

1 Answers1

1

How str.join works (from analysing the source code)

First it checks if the object is an iterable & creates a sequence out of it if needed

seq = PySequence_Fast(iterable, "can only join an iterable");

If the object is a list or tuple, it just returns the object itself, no need to iterate.

If it's not, then it iterates to create a list. That's where the object is fully iterated upon.

From there, only the list copy is used. iterable has been iterated upon and is useless now if it wasn't list or tuple.

(I couldn't track down the call to len, would take a debugging session to find it in the PySequence_Fast call, but that seems useless. Your iterable has a __len__ method, okay, but since it's not a list or tuple, the returned value isn't used)

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219