-3

I want to get combinations of substrings like an example below:

  • original string: “This is my pen”
  • expected output list: [“This is my pen”, “This is mypen”, “This ismy pen”, “Thisis my pen”, “This ismypen”, “Thisismy pen”, “Thisismypen”]

As you can see in the example, I’d like to get all substring combinations by removing a white space character(s) while keeping the order of the sequence.

I’ve tried to use strip(idx) function and from itertools import combinations. But it was hard to keep the order of the original sentence and also get all possible cases at the same time.

Any basic ideas will be welcomed! Thank you very much.

  • I’m a newbie to programming so please let me know if I need to write more details. Thanks a lot.
Hari Lee
  • 3
  • 1
  • Welcome to Stack Overflow. The reason you're having difficulty figuring out a solution with `from itertools import combinations` is that the output values you want are **not combinations** in the combinatoric sense. You are looking for a cartesian product: each word besides the first either has a space before it or doesn't, so there are two options for each of those values, and you want the cartesian product of all those option-groups. – Karl Knechtel Oct 15 '22 at 06:42
  • The canonical duplicate (I am out of close votes today) is [Get the cartesian product of a series of lists?](/questions/533905). It should be clear what 'lists' are needed in order to get the needed results; they'll be tuples, which can then easily be stitched together. – Karl Knechtel Oct 15 '22 at 06:44
  • As for "hard to keep the order of the original sentence" - I can't fathom a reason why there would be any difficulty. They'll only get out of order if you deliberately reorder them, shuffle them, or put them in a container that doesn't have an order (such as a `set`). – Karl Knechtel Oct 15 '22 at 06:45
  • If the question is simply about how to get individual words (without whitespace) from the original sentence, that is `.split`, not `.strip`; see [How do I split a string into a list of words?](https://stackoverflow.com/questions/743806). – Karl Knechtel Oct 15 '22 at 06:46

1 Answers1

0

Try this:

import itertools

s = "This is my pen"
s_list = s.split()
s_len = len(s_list)

Then:

r = ["".join(itertools.chain(*zip(s_list, v+("",))))
     for v in itertools.product([" ", ""], repeat=s_len-1)]

This results in r having the following value:

['This is my pen',
 'This is mypen',
 'This ismy pen',
 'This ismypen',
 'Thisis my pen',
 'Thisis mypen',
 'Thisismy pen',
 'Thisismypen']

If you just want to iterate over the values, you can avoid creating the top-level list as follows:

for u in ("".join(itertools.chain(*zip(s_list, v+("",))))
          for v in itertools.product([" ", ""], repeat=s_len-1)):
    print(u)

which produces:

This is my pen
This is mypen
This ismy pen
This ismypen
Thisis my pen
Thisis mypen
Thisismy pen
Thisismypen
Tom Karzes
  • 22,815
  • 2
  • 22
  • 41