How to get a zip of all characters in a string. zip misses out on final characters and itertools.zip_longest adds none

Question

I am passing the result of itertools.zip_longest to itertools.product, however I get errors when it gets to the end and finds None.

The error I get is: Error: (, TypeError('sequence item 0: expected str instance, NoneType found',), )

If I use zip instead of itertools.zip_longest then I don't get all the items.

Here is the code I am using to generate the zip:

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    print(args)
    #return zip(*args)
    return itertools.zip_longest(*args)

sCharacters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~`!@#$%^&*()_-+={[}]|\"""':;?/>.<,"

for x in grouper(sCharacters, 4):
    print(x)

Here is the output. The first one is itertools.zip_longest and the second is just zip. You can see the first with the None items and the second is missing the final item, the comma: ','

How can I get a zip of all characters in a string without the none at the end. Or how can I avoid this error?

Thanks for your time.

Some (but not all) of the answers to [What is the most “pythonic” way to iterate over a list in chunks?](https://stackoverflow.com/q/434287/364696) are answers to this question. — ShadowRanger, Aug 18 '21 at 15:31

DomTomCat · Answer 1 · 2016-06-09T13:57:27.730

1

the length of sCharacters is 93 (Note, 92 % 4 ==0). so since zip outputs a sequence of length of the shortest input sequence, it will miss the last element

Beware, the addition of the Nones of itertools.zip_longest are artificial values which may not be the desired behaviour for everyone. That's why zip just ignores unneccessary, additional values

EDIT: to be able to use zip you could append some whitespace to your string:

n=4
sCharacters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~`!@#$%^&*()_-+={[}]|\"""':;?/>.<,"
if len(sCharacters) % n > 0:
    sCharacters = sCharacters + (" "*(n-len(sCharacters) % n))

EDIT2: to obtain the missing tail when using zip use code like this:

tail = '' if len(sCharacters)%n == 0 else sCharacters[-(len(sCharacters)%n):]

edited Jun 09 '16 at 13:57

answered Jun 09 '16 at 12:18

DomTomCat

8,189
1
49
64

But the problem is with zip it is leaving out one of the values I do want... the final character which is a comma. You can see it is present in the zip_longest but not in the zip result – user2109254 Jun 09 '16 at 13:37
yes, but's the defined behaviour of `zip`. You can fill up the string beforehand, see my updated answer – DomTomCat Jun 09 '16 at 13:45
thanks for the response. The problem with that is I am going over a large list with the combinations... so adding extra combinations with no value will result in wasted time... what are the other options for getting a chunk size and just making the last one whatever is left...? – user2109254 Jun 09 '16 at 13:48
I've again added some code to retrieve what's going to be left over when using `zip` - I'm not exactly sure if this answer's your question – DomTomCat Jun 09 '16 at 13:58
1

great work mate!! I could make it work using that to clean up the leftovers. I used switched grouper to use zip and make one extra call using the length of the tail. Thanks heaps for taking the time to help!! – user2109254 Jun 09 '16 at 14:17

ShadowRanger · Accepted Answer · 2021-08-18T15:37:46.290

I've had to solve this in a performance critical case before, so here is the fastest code I've found for doing this (works no matter the values in iterable):

from itertools import zip_longest

def grouper(n, iterable):
    fillvalue = object()  # Guaranteed unique sentinel, cannot exist in iterable
    for tup in zip_longest(*(iter(iterable),) * n, fillvalue=fillvalue):
        if tup[-1] is fillvalue:
            yield tuple(v for v in tup if v is not fillvalue)
        else:
            yield tup

The above is, a far as I can tell, unbeatable when the input is long enough and the chunk sizes are small enough. For cases where the chunk size is fairly large, it can lose out to this even uglier case, but usually not by much:

from future_builtins import map  # Only on Py2, and required there
from itertools import islice, repeat, starmap, takewhile
from operator import truth  # Faster than bool when guaranteed non-empty call

def grouper(n, iterable):
    '''Returns a generator yielding n sized groups from iterable
    
    For iterables not evenly divisible by n, the final group will be undersized.
    '''
    # Can add tests to special case other types if you like, or just
    # use tuple unconditionally to match `zip`
    rettype = ''.join if type(iterable) is str else tuple

    # Keep islicing n items and converting to groups until we hit an empty slice
    return takewhile(truth, map(rettype, starmap(islice, repeat((iter(iterable), n)))))

Either approach seamlessly leaves the final element incomplete if there aren't sufficient items to complete the group. It runs extremely fast because literally all of the work is pushed to the C layer in CPython after "set up", so however long the iterable is, the Python level work is the same, only the C level work increases. That said, it does a lot of C work, which is why the zip_longest solution (which does much less C work, and only trivial Python level work for all but the final chunk) usually beats it.

The slower, but more readable equivalent code to option #2 (but skipping the dynamic return type in favor of just tuple) is:

 def grouper(n, iterable):
     iterable = iter(iterable)
     while True:
         x = tuple(islice(iterable, n))
         if not x:
             return
         yield x

Or more succinctly with Python 3.8+'s walrus operator:

 def grouper(n, iterable):
     iterable = iter(iterable)
     while x := tuple(islice(iterable, n)):
         yield x

nice work thanks for that. It doesn't work in this application for some reason... no threads get started when I use it. I can see with a print that the list is generated... but it is in a different format to zip and is not working... — user2109254, Jun 10 '16 at 07:34
@user2109254: "Different format"? You're going to have to be more specific. Aside from the `rettype` thing (which can be changed to just using `tuple`, always), and the differing length for the final group, it's identical to Python 3's `zip` (producing a generator). You'd wrap a call in `list()` if you need the values in a `list`, and can't just iterate them once. — ShadowRanger, Jun 10 '16 at 11:19
see in my screen shot in the question how x prints out... with your solution x prints out like a string like a string of letters. eg; abcd... where is the the zip function returns it like ('a', 'b', 'c', 'd')... and I pass that in like this: pool.apply_async(find_match, (x,) + (iKeyLength,), callback=callback) and this works. When I use your solution it doesn't work as x is in a different format. I do like your code though very compact. — user2109254, Jun 10 '16 at 12:17
@user2109254: That's where I said you can drop the `rettype` bit. Just remove `rettype` from the code, and replace the use in the `return` statement with `tuple` and it will return `tuple`s. I only used the dynamic `rettype` to reduce memory overhead a titch in the case where a `str` was being grouped. As I said in the comment "just use `tuple` unconditionally to match `zip`". You need to read the code to understand it, not just copy it blindly, or you won't learn anything. — ShadowRanger, Jun 10 '16 at 12:52
@ShaddowRanger Perfect!! Thanks for taking the time to explain this, much appreciated!! — user2109254, Jun 10 '16 at 23:33

How to get a zip of all characters in a string. zip misses out on final characters and itertools.zip_longest adds none

2 Answers2