2

Let's say I'm building an itertools.chain instance as follows:

from itertools import chain

list_1 = list(range(5, 15))
list_2 = list(range(20, 30))
chained = chain(list_1, list_2)

Now, since I already know the length of the lists contained in chained I can easily get the length of chained. How can I add the __len__ to chained?

I tried this:

full_len = len(list_1) + len(list_2)
setattr(chained, '__len__', lambda: full_len)

but it fails with the error

AttributeError: 'itertools.chain' object has no attribute '__len__'

Edit: I need this to be able to display the progress of a long process with tqdm, which relays in the __len__ method to be able to show the progress bar

DSantiagoBC
  • 464
  • 2
  • 11
  • 1
    Nope. A generator doesn't know its own length until it is exhausted. You could, I suppose, derive a class from `chain` to add this. – Tim Roberts Apr 17 '23 at 22:48
  • 2
    *"I need this to be able to display the progress of a long process with tqdm"* - this is a classic [XY problem](https://xyproblem.info/). `tqdm` offers ways to set the number of iterations even if the object you iterate over has no `len()`. See: https://stackoverflow.com/questions/41985993/tqdm-show-progress-for-a-generator-i-know-the-length-of – Marco Bonelli Apr 17 '23 at 22:52
  • 1
    Does this answer your question? [tqdm show progress for a generator I know the length of](https://stackoverflow.com/questions/41985993/tqdm-show-progress-for-a-generator-i-know-the-length-of) – Marco Bonelli Apr 17 '23 at 22:52
  • @TimRoberts Yeah kind of, but since I already know the length it seems reasonable I should be able to tell the chain so – DSantiagoBC Apr 17 '23 at 22:55
  • @MarcoBonelli "Others try to help the user with Y, but are confused because Y seems like a strange problem to want to solve". I think this is the problem. I hate when I ask Y in my quest to get to X and people try to answer X for me. It's pretentious and annoying and seems to be perpetrated by too many people on stack overflow. Edit: Even worse are those who assume they know the X I am working toward, which is very often not the case. Just answer the question that was asked. If "It can't be done" is the answer, then let that be it! – rocksNwaves Apr 17 '23 at 22:58
  • 2
    @rocksNwaves Did you miss that the question mentions their X? Marco even quoted it. – Kelly Bundy Apr 17 '23 at 23:11
  • @KellyBundy That has no bearing on the point I am making. But thank you. – rocksNwaves Apr 17 '23 at 23:33
  • 3
    @rocksNwaves But what's "pretentious and annoying" about helping with an actual solution when the imagined approach can't be done? (That last bit is important. I do frequently downvote answers that don't answer the question asked, when it can be answered properly. Like when people ask why their code doesn't work and answerers just ignore it and dump a completely different solution.) – Kelly Bundy Apr 17 '23 at 23:43
  • 1
    To address the question directly, then; the reason you are getting the error is because you cannot set a new attribute on `itertools.chain`. See [Can't set attributes on instance of "object" class](https://stackoverflow.com/questions/1529002/cant-set-attributes-on-instance-of-object-class) and [Attribute assignment to built-in object](https://stackoverflow.com/questions/5741699/attribute-assignment-to-built-in-object) for some discussion on why. – Amadan Apr 18 '23 at 00:22
  • Just curious: why are you using `chain` instead of for example `list_1 + list_2`? – Kelly Bundy Apr 18 '23 at 00:28
  • @KellyBundy AFAIK it helps performance, since it doesn't need to iterate over the lists to create the new list and iterate again to do the other stuff I need – DSantiagoBC Apr 18 '23 at 00:52
  • Hmm, but it does create an extra iteration layer (the chain iterator), and every element passes through it. Via the general iteration protocol. Whereas list concatenation might very well use a faster more direct copying, since lists know about the internals of lists. – Kelly Bundy Apr 18 '23 at 01:04
  • @KellyBundy I can see that. I guess some proper benchmarking would help here. One thing I didn't mention in the question though is I have a lot of lists to concatenate. If I'm not wrong, the overhead chain adds doesn't really change with more or less iterables, but list addition probably does – DSantiagoBC Apr 18 '23 at 01:23
  • Depends on how you add them. There are good and bad ways for both. And depends on the total length. With your example and also with 1000 times longer lists, `list_1 + list_2` is faster than `chain` in my tests. But at 100000 times longer, it's slower, presumably because of cache misses. `chain` is cache-friendly, since its two iterations are parallel (the same element immediately goes through both the list iterator and the chain iterator). But both are ***much*** faster than when I involve tqdm. It's like 10 vs 20 vs 240 ns per element. – Kelly Bundy Apr 18 '23 at 08:50

2 Answers2

1

You could extend the class using __new__. See here for why.. Taking your example we could write:

class Chain(itertools.chain):
    def __new__(cls, *args):
        obj = super().__new__(cls, *args)
        obj.args = args
        return obj

    def __len__(self) -> int:
        return sum(map(len, self.args))
>>> chained = Chain([1], [2, 3])
>>> len(chained)
3

Although returning the length of this generator is somewhat awkward due to the content being exhausted after the first iteration (you can only loop over a generator once, it does not store).

What you probably want is a simple helper that will allow easy chaining, but return a list implementation which supports multiple iteration and len.

def chain_list(*args):
    return list(itertools.chain(*args))

That might become pretty expensive depending on the iterables provided (say a range(1, 1000000000)). In which case you should probably define your own interface that implements methods such as __iter__, potentially using itertools.chain under the hood, but not subclassing it directly.

flakes
  • 21,558
  • 8
  • 41
  • 88
0

Create a new class, define the function for the new class and use it instead of the original.

  • How would you do that? – Kelly Bundy Apr 17 '23 at 23:13
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 20 '23 at 04:05