1

This is code with 8 repitios possible of all charaters

from itertools import *
for i in product(['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f'],repeat = 8):
    b = (''.join(i))
    print (b)

How can I do something like that - allow maximum 4 or 5 character repetition per 8-symbol string. As an example, 222abbccc or a3333abd.

Allow every symbol from the list to repeat from 1 to 4 times at any place in 8-symbol string, but keep working permutation and try not lose performance.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
tseries
  • 723
  • 1
  • 6
  • 14

1 Answers1

2

A recursive function can implement additional rules above what product is producing. Realize that 16 characters with a repeat of 8 has ~4 billion results without a rep limit. The following function works but the example is limited for time:

from pprint import pprint

def generate(charset, length, maxrep):
    if length == 1:
        yield from charset
    else:
        for prefix in generate(charset, length - 1, maxrep):
            for char in charset:
                # Skip characters if the prefix already ends with the max rep
                if not prefix.endswith(char * maxrep):
                    yield prefix + char

pprint(list(generate('012',4,2)),compact=True)

Output:

['0010', '0011', '0012', '0020', '0021', '0022', '0100', '0101', '0102', '0110',
 '0112', '0120', '0121', '0122', '0200', '0201', '0202', '0210', '0211', '0212',
 '0220', '0221', '1001', '1002', '1010', '1011', '1012', '1020', '1021', '1022',
 '1100', '1101', '1102', '1120', '1121', '1122', '1200', '1201', '1202', '1210',
 '1211', '1212', '1220', '1221', '2001', '2002', '2010', '2011', '2012', '2020',
 '2021', '2022', '2100', '2101', '2102', '2110', '2112', '2120', '2121', '2122',
 '2200', '2201', '2202', '2210', '2211', '2212']
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • About 4 billion, i not need keep them it be used at fly and given to another function... so there more importand not lose geneation perfomance. – tseries Oct 01 '20 at 23:56
  • @tseries On my system generating the first 65536 entries for `generate('0123456789abcdef',8,4)` took 22ms, so it should be less than 65536 * 22ms or ~24 minutes to generate 4billion entries. It should be somewhat faster given this algorithm skips entries with >4 symbols in a row. – Mark Tolonen Oct 02 '20 at 00:17
  • there is one problems its not at fly how i undestand and it need list . – tseries Oct 02 '20 at 00:28
  • @tseries I didn't quite follow your question, but I think you want `for item in generate('012',4,2):` and don't store in list like my small example. I called `list` to show the results in one big list. You won't want that for billions of permutations. – Mark Tolonen Oct 02 '20 at 00:31
  • for item in (generate('0123456789',8,4)): a = (''.join(item)) print (a) add to answer maybe some who be need to solution with no store it at list for they no search at commnets :) – tseries Oct 02 '20 at 00:33
  • @tseries Just 'print(item)`. Function returns strings already. – Mark Tolonen Oct 02 '20 at 00:34
  • 16,8,4 with non list Time: 1627.4064108 vs product with 16, repeat=8 Time: 1210.542135 slowe even than permutation with all 00000000 to FFFFFFFFF , so need think about another way :( – tseries Oct 02 '20 at 01:10
  • @tseries itertools is a built-in, so `product` is written in C using the Python C API, not straight python. – Mark Tolonen Oct 02 '20 at 01:53
  • So how i undestand only way use product , but simple reject all what not need at next function. – tseries Oct 02 '20 at 01:59
  • In your timing, did you just product+join, or just product? generate doesn't need join. – Mark Tolonen Oct 02 '20 at 02:01
  • for i in product(['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f'],repeat = 8): b = (''.join(i)) – tseries Oct 02 '20 at 02:15
  • @tseries I got product is 2x as fast, but anything I try to meet your criteria makes it slower than what I came up with so far. Adding that comparison code slows it down again. – Mark Tolonen Oct 02 '20 at 02:19
  • combinations_with_replacement , even faster that product bit it not alow mirrored pairs :) for i in combinations_with_replacement(['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f'], 8): b = (''.join(i)) Time: 0.09523279999999995 – tseries Oct 02 '20 at 02:28
  • + if use as example 2 repeat at your script (generate('0123456789abcdef',8,2)): 00100e70 its not very try filter it ) – tseries Oct 02 '20 at 02:43