0

Given a list of words and chars, I need to generate words variations (lowercase, uppercase, etc.) and all possible permutations avoiding duplicates (case-insensitive).

Example

words = ["one", "two"]
chars = ["!"]

Words variations:

one, One, ONE, oNE, two, Two, TWO, tWO

...and possible (incremental) permutations:

one, one!, !one, One, One!, !One, ONE, ONE!, !ONE, two, two! ...,
...onetwo, onetwo!, !onetwo, Onetwo, Onetwo!, !Onetwo, ...
...OneTwo, OneTwo!, !OneTwo, ...
...twoone, twoone!, !twoone, ...etc.

but not:

oneOne, oneONE, oneoNE, ...twoTwo, twoTWO, twotWO...

This is my Python code:


words = ["one", "two"]
chars = ["2022", "!", "_"]

file_permuted = "permuted_words.txt"

transformed_words = []
words_to_permute = []
permuted_words = []
counter = 0
total_counter = 0

for word in words:

    # Apply the case transformations: word, WORD, Word, wORD
    lowercase_all = word.lower()
    uppercase_all = word.upper()
    capitalize_first = word.capitalize()
    toggle_case =  capitalize_first.swapcase()

    # Add the transformed words to the list
    transformed_words.append(lowercase_all)
    transformed_words.append(uppercase_all)
    transformed_words.append(capitalize_first)
    transformed_words.append(toggle_case)

words_to_permute = transformed_words + chars

print("Generating permutations...")
with open(file_permuted, "w") as f:
    for i in range(1, len(words_to_permute) + 1):
        for permutation in itertools.permutations(words_to_permute, i):
            len_set_permutation = len(set(list(map(lambda x: x.lower(), permutation))))
            if (len_set_permutation == len(permutation)):
                f.write("".join(permutation) + "\n")
                if (counter == 100):
                    total_counter += counter
                    print('Processed {0} items'.format(str(total_counter)))
                    counter = 0

                counter += 1

Please, suggest me a better/more elegant and efficient way.

  • What have you tried so far? – Erich Jan 23 '23 at 22:38
  • 2
    for each binary number of the same length of the string count 0 as lowercase and 1 as uppercase – Caridorc Jan 23 '23 at 22:39
  • The Python code I wrote is working but is not so efficient: it removes duplicates (_wordWord, WordWord, etc.._) after permutations have been generated `len_set_permutation = len(set(list(map(lambda x: x.lower(), permutation))))` - unneded computational elaborations. – CyberLuke365 Jan 23 '23 at 23:06
  • [case permutations](https://stackoverflow.com/questions/11144389/find-all-upper-lower-and-mixed-case-combinations-of-a-string) – Chris Charley Jan 29 '23 at 19:41

2 Answers2

1

Like this:

def word_casing (word):
    if 0 == len(word):
        yield ""
    else:
        char = word[0]
        for next_word in word_casing(word[1:]):
            yield char + next_word
            yield char.upper() + next_word

def word_casing_fn (word):
    def inner ():
        yield from word_casing(word)
    return inner

def char_listing_fn (char):
    def inner ():
        yield char
    return inner

def subsets (items):
    if 0 == len(items):
        yield []
    else:
        for s in subsets(items[1:]):
            yield s
            yield [items[0]] + s

def nonempty_subsets (items):
    for s in subsets(items):
        if 0 < len(s):
            yield s

def permutations (items):
    if 0 == len(items):
        yield []
    else:
        for i in range(len(items)):
            if 0 == i:
                for p in permutations(items[1:]):
                    yield [items[0]] + p
            else:
                (items[0], items[i]) = (items[i], items[0])
                for p in permutations(items[1:]):
                    yield [items[0]] + p
                (items[0], items[i]) = (items[i], items[0])

def list_fn_combinations (list_fns):
    if 1 == len(list_fns):
        for item in list_fns[0]():
            yield item
    elif 1 < len(list_fns):
        for item in list_fns[0]():
            for comb in list_fn_combinations(list_fns[1:]):
                yield item + comb

def all_combs (words, chars):
    list_fns = []
    for word in words:
        list_fns.append(word_casing_fn(word))
    for char in chars:
        list_fns.append(char_listing_fn(char))
    for s in nonempty_subsets(list_fns):
        for p in permutations(s):
            for c in list_fn_combinations(p):
                yield c


for w in all_combs(["one", "two"], ["!"]):
    print(w)
btilly
  • 43,296
  • 3
  • 59
  • 88
1

You can do this by steps, but please note the data size, which can very quickly become huge:

  • prepare the 4 variations of each of the W words
  • compute the cartesian product of the variation lists; this will create T = 4**W tuples
  • compute the permutations of each tuple; this will create P = T * W! tuples
  • finally generate the versions of each tuple with one of the C chars before or after, and output the original tuple plus all its versions; the total number of items will be V = P * (2*C+1)

So with your minimal example of 2 words and 1 char we'll get 96 versions; but should we have 4 words and 3 chars, we'd produce some 43k versions: 4**4 * 4! * (2*3+1). For 10 words we'd have some 2.66E13 versions, 4**10 * 10! * (2*3+1), or, assuming all words have length 5, more than 1 petabyte of memory occupation. We really need to go for generators!

My code:

def get_variants(w):
    w_cap = w.capitalize()
    return (w.upper(), w.lower(), w_cap, w_cap.swapcase())

def get_product(words):
    variants = [get_variants(w) for w in words]
    for p in product(*variants):
        yield p

def get_perms(words):
    for prod in get_product(words):
        for perm in permutations(prod):
            yield perm

def get_versions(words, chars):
    for p in get_perms(words):
        s = ''.join(p)
        yield s
        for ch in chars:
            yield ch+s
            yield s+ch
gimix
  • 3,431
  • 2
  • 5
  • 21
  • It's very elegant and performing code. Thank you !!! Is there a way to include permutations of each word and char ? Example: _one, One, ONE, oNE, !one, one!, !One, One!, !ONE, ONE!, !oNE, oNE!, two, Two, TWO, tWO, !two, two!, !Two, Two!, !TWO, TWO!, !tWO, tWO!_ ? – CyberLuke365 Jan 27 '23 at 21:35
  • Yes, just skip the product and permutations steps: define a new `get_versions` that calls `get_variants(word)` for each word, instead of calling `get_perms(words)` once – gimix Jan 29 '23 at 12:22