4

I would like to write a function my_func(n,l) that, for some positive integer n, efficiently enumerates the ordered non-negative integer composition* of length l (where l is greater than n). For example, I want my_func(2,3) to return [[0,0,2],[0,2,0],[2,0,0],[1,1,0],[1,0,1],[0,1,1]].

My initial idea was to use existing code for positive integer partitions (e.g. accel_asc() from this post), extend the positive integer partitions by a couple zeros and return all permutations.

def my_func(n, l):
    for ip in accel_asc(n):
        nic = numpy.zeros(l, dtype=int)
        nic[:len(ip)] = ip
        for p in itertools.permutations(nic):
            yield p

The output of this function is wrong, because every non-negative integer composition in which a number appears twice (or multiple times) appears several times in the output of my_func. For example, list(my_func(2,3)) returns [(1, 1, 0), (1, 0, 1), (1, 1, 0), (1, 0, 1), (0, 1, 1), (0, 1, 1), (2, 0, 0), (2, 0, 0), (0, 2, 0), (0, 0, 2), (0, 2, 0), (0, 0, 2)].

I could correct this by generating a list of all non-negative integer compositions, removing repeated entries, and then returning a remaining list (instead of a generator). But this seems incredibly inefficient and will likely run into memory issues. What is a better way to fix this?

EDIT

I did a quick comparison of the solutions offered in answers to this post and to another post that cglacet has pointed out in the comments. enter image description here

On the left, we have the l=2*n and on the right we have l=n+1. In these two cases, user2357112's second solutions is faster than the others, when n<=5. For n>5, solutions proposed by user2357112, Nathan Verzemnieks, and AndyP are more or less tied. But the conclusions could be different when considering other relationships between l and n.

..........

*I originally asked for non-negative integer partitions. Joseph Wood correctly pointed out that I am in fact looking for integer compositions, because the order of numbers in a sequence matters to me.

Community
  • 1
  • 1
Alice Schwarze
  • 559
  • 7
  • 21
  • One of the easiest ways to remove duplicates from a list in python is to cast to `set` and then cast back to `list`. In this case, you could even just iterate through the `set` without casting it back. Because it's standard library typecasting, and the standard library is optimized to hell and back, that should move pretty fast. Would that work? – Green Cloak Guy Apr 16 '19 at 21:37
  • Possible duplicate of [Elegant Python code for Integer Partitioning](https://stackoverflow.com/questions/10035752/elegant-python-code-for-integer-partitioning) – BadZen Apr 16 '19 at 21:38
  • 1
    @Green Cloak Guy It would still require that I generate all non-negative integer partitions before yielding the first one. The number of non-negative integer partitions grows extremely fast with `n` and `l`, so I would really like to avoid that. – Alice Schwarze Apr 16 '19 at 21:39
  • 1
    @BadZen That thread is on *positive* integer partitions. I want *non-negative* integer partitions. – Alice Schwarze Apr 16 '19 at 21:40
  • @AliceSchwarze: That just means you partition `n-l` and add 1 to the partition elements. The real problem is that that question is about partitions in the standard sense of integer partitions, which are multisets, not fixed-length tuples. – user2357112 Apr 16 '19 at 21:52
  • @user2357112 `n-l` is a negative number, so I can't quite follow what your idea. Could you explain your approach in a bit more detail? – Alice Schwarze Apr 16 '19 at 21:58
  • 1
    @AliceSchwarze: Other way around for applying a positive-element solution to a non-negative-element solution - partition `n+l` into positive integers and subtract 1 from the partition elements. Of course, that doesn't solve the deeper multiset/tuple problem. – user2357112 Apr 16 '19 at 22:00
  • Possible duplicate of [Generate all possible outcomes of k balls in n bins (sum of multinomial / categorical outcomes)](https://stackoverflow.com/questions/37711817/generate-all-possible-outcomes-of-k-balls-in-n-bins-sum-of-multinomial-catego) – cglacet Apr 16 '19 at 22:19
  • 1
    For what it's worth, I just tested all the solutions at the question @cglacet mentions and none of them are as fast as the two faster ones here, although [this one](https://stackoverflow.com/a/38691110/9200529) is close. – Nathan Vērzemnieks Apr 17 '19 at 01:17
  • 1
    IMO it's worth something, I don't know how Stackoverflow is supposed to work in this case, but maybe the best solution would be to merge the two questions (as these two problems are the exact same, unless I'm missing something). – cglacet Apr 17 '19 at 01:25
  • 1
    I haven't seen it mentioned here (apologies if I missed it), but these are called [integer compositions](https://en.wikipedia.org/wiki/Composition_(combinatorics)). Integer partitions are analogous to combinations where order doesn't matter. – Joseph Wood Apr 17 '19 at 02:12
  • @cglacet - I agree, they seem like exactly the same problem. I was thinking of leaving an answer on that post with details of the two faster solutions here and timings for all of them. Then they could be merged and cleaned up or this one could be marked as dupe. *shrug* – Nathan Vērzemnieks Apr 17 '19 at 03:55

2 Answers2

4

Use the stars and bars concept: pick positions to place l-1 bars between n stars, and count how many stars end up in each section:

import itertools

def diff(seq):
    return [seq[i+1] - seq[i] for i in range(len(seq)-1)]

def generator(n, l):
    for combination in itertools.combinations_with_replacement(range(n+1), l-1):
        yield [combination[0]] + diff(combination) + [n-combination[-1]]

I've used combinations_with_replacement instead of combinations here, so the index handling is a bit different from what you'd need with combinations. The code with combinations would more closely match a standard treatment of stars and bars.


Alternatively, a different way to use combinations_with_replacement: start with a list of l zeros, pick n positions with replacement from l possible positions, and add 1 to each of the chosen positions to produce an output:

def generator2(n, l):
    for combination in itertools.combinations_with_replacement(range(l), n):
        output = [0]*l
        for i in combination:
            output[i] += 1
        yield output
user2357112
  • 260,549
  • 28
  • 431
  • 505
4

Starting from a simple recursive solution, which has the same problem as yours:

def nn_partitions(n, l):
    if n == 0:
        yield [0] * l
    else:
        for part in nn_partitions(n - 1, l):
            for i in range(l):
                new = list(part)
                new[i] += 1
                yield new

That is, for each partition for the next lower number, for each place in that partition, add 1 to the element in that place. It yields the same duplicates yours does. I remembered a trick for a similar problem, though: when you alter a partition p for n into one for n+1, fix all the elements of p to the left of the element you increase. That is, keep track of where p was modified, and never modify any of p's "descendants" to the left of that. Here's the code for that:

def _nn_partitions(n, l):
    if n == 0:
        yield [0] * l, 0
    else:
        for part, start in _nn_partitions(n - 1, l):
            for i in range(start, l):
                new = list(part)
                new[i] += 1
                yield new, i

def nn_partitions(n, l):
    for part, _ in _nn_partitions(n, l):
        yield part

It's very similar - there's just the extra parameter passed along at each step, so I added wrapper to remove that for the caller.

I haven't tested it extensively, but this appears to be reasonably fast - about 35 microseconds for nn_partitions(3, 5) and about 18s for nn_partitions(10, 20) (which yields just over 20 million partitions). (The very elegant solution from user2357112 takes about twice as long for the smaller case and about four times as long for the larger one. Edit: this refers to the first solution from that answer; the second one is faster than mine under some circumstances and slower under others.)

Nathan Vērzemnieks
  • 5,495
  • 1
  • 11
  • 23
  • This requires recursion depth proportional to the value of `n`, though, so it doesn't work for something like `nn_partitions(1000, 3)`. – user2357112 Apr 16 '19 at 22:50
  • The poster specified that `l > n`, so we'll run out of time _long_ before we run out of recursion depth :-D – Nathan Vērzemnieks Apr 16 '19 at 22:52
  • Yeah, with that constraint, recursion depth proportional to `n` isn't a problem. Recursion depth proportional to `l` would be, but not `n`. – user2357112 Apr 16 '19 at 22:57
  • 1
    The new non-copying approach may save time, but it also means callers can't store or modify the yielded list object without copying it manually. This is inevitably going to be a major source of bugs, even if explicitly documented. – user2357112 Apr 16 '19 at 23:42
  • Ah, of course! I had another bug in my testing that masked this. And correcting it actually makes it slower. I'll revert. Thanks! – Nathan Vērzemnieks Apr 16 '19 at 23:51
  • 1
    Nathan, you reverted your edit in your first block of code but not in the second. Shouldn't it be both? – Alice Schwarze Apr 17 '19 at 00:34
  • 1
    Whoops! Yes indeed, apologies. I'm a bit scattered this afternoon. Will fix. – Nathan Vērzemnieks Apr 17 '19 at 00:39