String Replacement Combinations

Question

So I have a string '1xxx1' and I want to replace a certain number (maybe all maybe none) of x's with a character, let's say '5'. I want all possible combinations (...maybe permutations) of the string where x is either substituted or left as x. I would like those results stored in a list.

So the desired result would be

>>> myList = GenerateCombinations('1xxx1', '5')
>>> print myList
['1xxx1','15xx1','155x1','15551','1x5x1','1x551','1xx51']

Obviously I'd like it to be able to handle strings of any length with any amount of x's as well as being able to substitute any number. I've tried using loops and recursion to figure this out to no avail. Any help would be appreciated.

score 18 · Accepted Answer · answered Feb 12 '13 at 21:03

18

How about:

from itertools import product

def filler(word, from_char, to_char):
    options = [(c,) if c != from_char else (from_char, to_char) for c in word]
    return (''.join(o) for o in product(*options))

which gives

>>> filler("1xxx1", "x", "5")
<generator object <genexpr> at 0x8fa798c>
>>> list(filler("1xxx1", "x", "5"))
['1xxx1', '1xx51', '1x5x1', '1x551', '15xx1', '15x51', '155x1', '15551']

(Note that you seem to be missing 15x51.) Basically, first we make a list of every possible target for each letter in the source word:

>>> word = '1xxx1'
>>> from_char = 'x'
>>> to_char = '5'
>>> [(c,) if c != from_char else (from_char, to_char) for c in word]
[('1',), ('x', '5'), ('x', '5'), ('x', '5'), ('1',)]

And then we use itertools.product to get the Cartesian product of these possibilities and join the results together.

For bonus points, modify to accept a dictionary of replacements. :^)

answered Feb 12 '13 at 21:03

DSM

342,061
65
592
494

1

Needed it for project euler 51. – Hoopdady Feb 12 '13 at 21:10
Why cracking a nut with a sledgehammer? – Sam Feb 12 '13 at 21:18
1

@Sam: I don't follow. `itertools.product` is the canonical way in Python to get the Cartesian product. – DSM Feb 12 '13 at 21:19
Maybe, I am the one not following. Anyway, Despite the elegance of your code, I believe, that a simple iterative approach should be considerably faster than the computation of a cartesian product. – Sam Feb 12 '13 at 21:22
2

@Sam: the most simple iterative approach (such as in the one in your deleted answer) doesn't scale up to larger strings, and a more sophisticated iterative approach basically reimplements `itertools.product`. What am I missing? – DSM Feb 12 '13 at 21:25
how can I make a php implementation of this? – Iván E. Sánchez Jan 08 '18 at 18:53
Ok, I have the product function, but I still don't understand the first part of the code can you please explain this part please -> `options = [(c,) if c != from_char else (from_char, to_char) for c in word]` – Iván E. Sánchez Jan 08 '18 at 19:44
forget it, I just realized what is going on, nice – Iván E. Sánchez Jan 08 '18 at 19:53

score 0 · Answer 2 · answered Mar 01 '23 at 23:28

Generate the candidate values for each possible position - even if there is only one candidate for most positions - then create a Cartesian product of those values.

In the OP's example, the candidates are ['x', '5'] for any position where an 'x' appears in the input; for each other position, the candidates are a list with a single possibility (the original letter). Thus:

def candidates(letter):
    return ['x', '5'] if letter == 'x' else [letter]

Then we can produce the patterns by producing a list of candidates for positions, using itertools.product, and combining them:

from itertools import product

def combine(candidate_list):
    return ''.join(candidate_list)

def patterns(data):
    all_candidates = [candidates(element) for element in data]
    for result in product(*all_candidates):
        yield combine(result)

Let's test it:

>>> list(patterns('1xxx1'))
['1xxx1', '1xx51', '1x5x1', '1x551', '15xx1', '15x51', '155x1', '15551']

Notice that the algorithm in the generator is fully general; all that varies is the detail of how to generate candidates and how to process results. For example, suppose we want to replace "placeholders" within a string - then we need to split the string into placeholders and non-placeholders, and have a candidates function that generates all the possible replacements for placeholders, and the literal string for non-placeholders.

For example, with this setup:

keywords = {'wouldyou': ["can you", "would you", "please"], 'please': ["please", "ASAP"]}

template = '((wouldyou)) give me something ((please))'

First we would split the template, for example with a regular expression:

import re

def tokenize(t):
    return re.split(r'(\(\(.*?\)\))', t)

This tokenizer will give empty strings before and after the placeholders, but this doesn't cause a problem:

>>> tokenize(template)
['', '((wouldyou))', ' give me something ', '((please))', '']

To generate replacements, we can use something like:

def candidates(part):
    if part.startswith('((') and part.endswith('))'):
        return keywords.get(part[2:-2], [part[2:-2]])
    else:
        return [part]

That is: placeholder-parts are identified by the parentheses, stripped of those parentheses, and looked up in the dictionary.

Trying it with the other existing definitions:

>>> list(patterns(tokenize(template)))
['can you give me something please', 'can you give me something ASAP', 'would you give me something please', 'would you give me something ASAP', 'please give me something please', 'please give me something ASAP']

To generalize patterns properly, rather than depending on other global functions combine and candidates, we should use dependency injection - by simply passing those as parameters which are higher-order functions. Thus:

from itertools import product

def patterns(data, candidates, combine):
    all_candidates = [candidates(element) for element in data]
    for result in product(*all_candidates):
        yield combine(result)

Now the same core code solves whatever problem. Examples might look like:

def euler_51(s):
    for pattern in patterns(
        s,
        lambda letter: ['x', '5'] if letter == 'x' else [letter],
        ''.join
    ):
        print(pattern)

euler_51('1xxx1')

or

def replace_in_template(template, replacement_lookup):
    tokens = re.split(r'(\(\(.*?\)\))', template)
    return list(patterns(
        tokens, 
        lambda part: (
            keywords.get(part[2:-2], [part[2:-2]])
            if part.startswith('((') and part.endswith('))')
            else [part]
        ),
        ''.join
    ))

replace_in_template(
    '((wouldyou)) give me something ((please))',
    {
        'wouldyou': ["can you", "would you", "please"],
        'please': ["please", "ASAP"]
    }
)

String Replacement Combinations

2 Answers2

Linked

Related