Generate lexicographic series efficiently in Python

Question

I want to generate a lexicographic series of numbers such that for each number the sum of digits is a given constant. It is somewhat similar to 'subset sum problem'. For example if I wish to generate 4-digit numbers with sum = 3 then I have a series like:

[3 0 0 0]

[2 1 0 0]

[2 0 1 0]

[2 0 0 1]

[1 2 0 0] ... and so on.

I was able to do it successfully in Python with the following code:

import numpy as np

M = 4 # No. of digits
N = 3 # Target sum

a = np.zeros((1,M), int)
b = np.zeros((1,M), int)

a[0][0] = N
jj = 0

while a[jj][M-1] != N:
    ii = M-2
    while a[jj][ii] == 0:
          ii = ii-1
    kk = ii
    if kk > 0:
       b[0][0:kk-1] = a[jj][0:kk-1]
    b[0][kk] = a[jj][kk]-1
    b[0][kk+1] = N - sum(b[0][0:kk+1])
    b[0][kk+2:] = 0
    a = np.concatenate((a,b), axis=0)
    jj += 1

for ii in range(0,len(a)):
    print a[ii]

print len(a)

I don't think it is a very efficient way (as I am a Python newbie). It works fine for small values of M and N (<10) but really slow beyond that. I wish to use it for M ~ 100 and N ~ 6. How can I make my code more efficient or is there a better way to code it?

Have a look at [integer composition](https://en.m.wikipedia.org/wiki/Composition_(combinatorics)) — Joseph Wood, Mar 20 '19 at 11:03
The given link has very useful information, however, it does not help in refining the algorithm. — Aman_X, Mar 21 '19 at 10:04

MBo · Accepted Answer · 2019-03-21T18:05:57.490

5

Very effective algorithm adapted from Jorg Arndt book "Matters Computational"
(Chapter 7.2 Co-lexicographic order for compositions into exactly k parts)

n = 4
k = 3

x = [0] * n
x[0] = k

while True:
    print(x)
    v = x[-1]
    if (k==v ):
        break
    x[-1] = 0
    j = -2
    while (0==x[j]):
        j -= 1
    x[j] -= 1
    x[j+1] = 1 + v

[3, 0, 0, 0]
[2, 1, 0, 0]
[2, 0, 1, 0]
[2, 0, 0, 1]
[1, 2, 0, 0]
[1, 1, 1, 0]
[1, 1, 0, 1]
[1, 0, 2, 0]
[1, 0, 1, 1]
[1, 0, 0, 2]
[0, 3, 0, 0]
[0, 2, 1, 0]
[0, 2, 0, 1]
[0, 1, 2, 0]
[0, 1, 1, 1]
[0, 1, 0, 2]
[0, 0, 3, 0]
[0, 0, 2, 1]
[0, 0, 1, 2]
[0, 0, 0, 3]

Number of compositions and time on seconds for plain Python (perhaps numpy arrays are faster) for n=100, and k = 2,3,4,5 (2.8 ghz Cel-1840)

2  5050 0.040000200271606445
3  171700 0.9900014400482178
4  4421275 20.02204465866089
5  91962520 372.03577995300293
I expect time  2 hours for 100/6 generation

Same with numpy arrays (x = np.zeros((n,), dtype=int)) gives worse results - but perhaps because I don't know how to use them properly

2  5050 0.07999992370605469
3  171700 2.390003204345703
4  4421275 54.74532389640808

Native code (this is Delphi, C/C++ compilers might optimize better) generates 100/6 in 21 seconds

3  171700  0.012
4  4421275  0.125
5  91962520  1.544
6  1609344100 20.748

Cannot go sleep until all measurements aren't done :)

MSVS VC++: 18 seconds! (O2 optimization)

5  91962520 1.466
6  1609344100 18.283

So 100 millions variants per second. A lot of time is wasted for checking of empty cells (because fill ratio is small). Speed described by Arndt is reached on higher k/n ratios and is about 300-500 millions variants per second:

n=25, k=15 25140840660 60.981  400 millions per second

edited Mar 21 '19 at 18:05

answered Mar 21 '19 at 11:31

MBo

77,366
5
53
86

Ran your code for n=100 and k=6, its takes time in this algorithm as well, may be days – Shoyeb Sheikh Mar 21 '19 at 13:17
@MBo I find this algorithm to be really fast and efficient. Perfect! – Aman_X Mar 21 '19 at 14:16
@Shoyeb Sheikh Do you know how many compositions with such parameters do exist? Any algorithm generating trillions of variants requires a lot of time. Described approach is very optimized - being implemented in C, it generates 300 millions variants per second (not counting output or writing - the most time consuming part) – MBo Mar 21 '19 at 15:46
@MBo well that was a part of question lol I didn't add anything to it, 300 millions per sec is very efficient though, Great ! – Shoyeb Sheikh Mar 21 '19 at 15:49
@Shoyeb Sheikh Yes, I noticed that author supposes to get a lot of variants - it is possible to calculate them, but it is rather hard to store and use such huge amount of data – MBo Mar 21 '19 at 16:01
@MBo That's a really interesting comparison. I tried it out too for n=100 and k=5, and it took 97 sec approx. BTW do you know if the computation can be parallelized? – Aman_X Mar 22 '19 at 11:18
No, this algorithm does not assume parallelization. How are you going to use result data? – MBo Mar 22 '19 at 12:14
I use it for generating basis to represent wavefunction and further calculations. – Aman_X Mar 22 '19 at 15:11
But... 1.6 trillion variants? – MBo Mar 22 '19 at 15:23
Well...that will be an extreme case, usually n is limited to 64. There are ways to reduce the number of variants further by using symmetries of the system, etc. – Aman_X Mar 22 '19 at 17:50

Shoyeb Sheikh · Answer 2 · 2019-03-21T13:15:41.023

0

I have a better solution using itertools as follows,

from itertools import product
n = 4 #number of elements
s = 3 #sum of elements
r = []
for x in range(n):
    r.append(x)
result = [p for p in product(r, repeat=n) if sum(p) == s]
print(len(result))
print(result)

I am saying this is better because it took 0.1 secs on my system, while your code with numpy took 0.2 secs.

enter image description here

enter link description here

But as far as n=100 and s=6, this code takes time to go through all the combinations, I think it will take days to compute the results.

edited Mar 21 '19 at 13:15

answered Mar 21 '19 at 06:47

Shoyeb Sheikh

2,659
2
10
19

Actually I want to improve the way algorithm is coded (or code a more efficient algorithm) to compute _just the required_ combinations. As far as I understand your code checks _all possible combinations_ against the given sum. Can you time both the codes for M=10, N=3 as well? – Aman_X Mar 21 '19 at 14:12
Its been an hour of so for M=10 and N=3 and its still running, I think MBo has a better answer there. – Shoyeb Sheikh Mar 21 '19 at 15:50
Nope you check it and let us know – Shoyeb Sheikh Mar 21 '19 at 16:01
Ok, I did that. It took less than a second. – Aman_X Mar 22 '19 at 11:00

score 0 · Answer 3 · answered Mar 21 '19 at 09:11

My recommendations:

Rewrite it as a generator utilizing yield, rather than a loop that concatenates a global variable on each iteration.
Keep a running sum instead of calculating the sum of some subset of the array representation of the number.
Operate on a single instance of your working number representation instead of splicing a copy of it to a temporary variable on each iteration.

Note no particular order is implied.

score 0 · Answer 4 · answered Nov 18 '19 at 14:27

I found a solution using itertools as well (Source: https://bugs.python.org/msg144273). Code follows:

import itertools
import operator

def combinations_with_replacement(iterable, r):
    # combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
        return
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != n - 1:
                break
        else:
            return
        indices[i:] = [indices[i] + 1] * (r - i)
        yield tuple(pool[i] for i in indices)

int_part = lambda n, k: (tuple(map(c.count, range(k))) for c in combinations_with_replacement(range(k), n))
for item in int_part(3,4): print(item)

Generate lexicographic series efficiently in Python

4 Answers4

Linked