7

I need all combinations of subsets of a string. In addition, a subset of length 1 can only be followed by a subset with length > 1. E.g. for string 4824 the result should be:

 [ [4, 824], [4, 82, 4], [48, 24], [482, 4], [4824] ]

So far I managed to retrieve all possible subsets with:

    length = len(number)
    ss = []
    for i in xrange(length):
        for j in xrange(i,length):
            ss.append(number[i:j + 1])

which gives me:

  ['4', '48', '482', '4824', '8', '82', '824', '2', '24', '4']

But I don't know how to combine those now.

wasp256
  • 5,943
  • 12
  • 72
  • 119
  • Maybe it can help you [List Comprehension](https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions) – emartinelli May 22 '15 at 13:21
  • By subset you mean a substring? – geckon May 22 '15 at 13:22
  • I think you are interested in [power sets](https://stackoverflow.com/questions/1482308/whats-a-good-way-to-combinate-through-a-set) – Cory Kramer May 22 '15 at 13:23
  • If you were successful in combine the elements then you would have some thing like : `[['4', '48', '482', '4824'], ['8', '82', '824'], ['2', '24'], ['4']]` – ZdaR May 22 '15 at 13:25
  • And what do I do with that then? The ouput should be `[ [4, 824], [4, 82, 4], [48, 24], [482, 4], [4824] ]` – wasp256 May 22 '15 at 13:46
  • Is this some kind of weird variance on a power set? :) – James Mills May 22 '15 at 13:48
  • "a subset of length 1 can only be followed by a subset with length > 1" How does `[4, 82, 4]` fit that rule? Or did you mean "immediately followed"? – tobias_k May 22 '15 at 13:55
  • 1
    This isn't about power sets. I don't know the proper terminology, but [this question](http://stackoverflow.com/questions/26791259/read-all-possible-sequential-substrings-in-python) is definitely related, and some of the answers given there could be refined for the additional requirement of *"subset of length 1 can only be followed by a subset with length > 1"*. – John Y May 22 '15 at 13:55
  • @wasp256 what would be expected output for 5-char string, eg. `abcde`? – Łukasz Rogalski May 22 '15 at 13:58
  • Also related: http://stackoverflow.com/questions/4904430/find-all-list-permutations-of-a-string-in-python – John Y May 22 '15 at 13:59
  • Yeah sadly I've lost the plot with this question; it isn't clear what the OP is trying to do or the expected output and really what it all means :) – James Mills May 22 '15 at 13:59
  • `[4, 8, 24]` why no ? – Jose Ricardo Bustos M. May 22 '15 at 14:37
  • @JoseRicardoBustosM. "a subset of length 1 can only be followed by a subset with length > 1" – skolsuper May 22 '15 at 15:03
  • 1
    @skolsuper i post a answer, based it ... I hope is correct – Jose Ricardo Bustos M. May 22 '15 at 15:07
  • @JoseRicardoBustosM. Upvoted already. I started something similar but didn't bother to tidy up the edge cases when I saw tobias' answer posted already. I agree with yours, I think it's better as a generator, and without resorting to sets. – skolsuper May 22 '15 at 15:23
  • @skolsuper thanks, i fixing my code .... "a subset of length 1 can only be followed by a subset with length > 1" ..... XD .... I had not read – Jose Ricardo Bustos M. May 22 '15 at 15:27

4 Answers4

9

First, write a function for generating all the partitions of the string:

def partitions(s):
    if s:
        for i in range(1, len(s) + 1):
            for p in partitions(s[i:]):
                yield [s[:i]] + p
    else:
        yield []

This iterates all the possible first segments (one character, two characters, etc.) and combines those with all the partitions for the respective remainder of the string.

>>> list(partitions("4824"))
[['4', '8', '2', '4'], ['4', '8', '24'], ['4', '82', '4'], ['4', '824'], ['48', '2', '4'], ['48', '24'], ['482', '4'], ['4824']]

Now, you can just filter those that match your condition, i.e. those that have no two consecutive substrings of length one.

>>> [p for p in partitions("4824") if not any(len(x) == len(y) == 1 for x, y in zip(p, p[1:]))]
[['4', '82', '4'], ['4', '824'], ['48', '24'], ['482', '4'], ['4824']]

Here, zip(p, p[1:]) is a common recipe for iterating over all pairs of consecutive items.


Update: Actually, incorporating your constraint directly into the partition function is not that hard, either. Just keep track of the last segment and set the minimum length accordingly.

def partitions(s, minLength=1):
    if len(s) >= minLength:
        for i in range(minLength, len(s) + 1):
            for p in partitions(s[i:], 1 if i > 1 else 2):
                yield [s[:i]] + p
    elif not s:
        yield []

Demo:

>>> print list(partitions("4824"))
[['4', '82', '4'], ['4', '824'], ['48', '24'], ['482', '4'], ['4824']]
tobias_k
  • 81,265
  • 12
  • 120
  • 179
  • This is inefficient for longer strings, since you are generating a lot of partitions that will be filtered. – chepner May 22 '15 at 14:45
  • @chepner Agreed. Added another algorithm doing the filtering while generating solutions. – tobias_k May 22 '15 at 15:11
2

would be interesting to see more test cases, the following algorithm does what you say:

s="4824"

def partitions(s):
  yield [s]
  if(len(s)>2):
    for i in range(len(s)-1, 0, -1):
      for g in partitions(s[i:]):
        out = [s[:i]] + g
        if not any([len(out[i]) == len(out[i+1]) and len(out[i])==1 for i in range(len(out)-1)]):
          yield out

list(partitions(s))

you get:

[['4824'], ['482', '4'], ['48', '24'], ['4', '824'], ['4', '82', '4']]

explanation

I based on the following algorithm:

s="4824"

def partitions_original(s):
  #yield original string
  yield [s]
  if(len(s)>2):
    for i in range(len(s)-1, 0, -1):
      #divide string in two parts
      #iteration 1: a="482", b="4"
      #iteration 2: a="48", b="24"
      #iteration 3: a="4", b="824"
      a = s[:i]
      b = s[i:]
      #recursive call of b
      for g in partitions_original(b):
        #iteration 1: b="4", g=[['4']]
        #iteration 2: b="24", g=[['24']]
        #iteration 3: b="824", g=[['824'], ['82', '4'], ['8', '24']]
        yield [a] + g

list(partitions_original(s))

you get:

[['4824'], ['482', '4'], ['48', '24'], ['4', '824'], 
['4', '82', '4'], ['4', '8', '24']]

the problem is ['4', '8', '24'] ..... then I must add if to code, because "a subset of length 1 can only be followed by a subset with length > 1"

[len(out[i]) == len(out[i+1]) and len(out[i])==1 for i in range(len(out)-1)] return for ['4', '8', '24'] -> [True, False] .... any Return True if any element of the iterable is true

NOTE

also it can be used:

if all([len(out[i]) != len(out[i+1]) or len(out[i])!=1 for i in range(len(out)-1)]):

Jose Ricardo Bustos M.
  • 8,016
  • 6
  • 40
  • 62
0

What I am doing here is to get all the possible split position of the string and eliminate the last one.

for example, in some string with 5 numbers "12345" for ex., there is 4 possible position to split the string, call it possibility = (0,0,0,0),(1,0,1,0)... with (0,0,1,0) mean (don't separate 1 and 2345,don't separate 12 and 345,separate 123 and 45,don't separate 1234 and 5) so you can get all possibilities while your condition is verified since we eliminate the (1,1,1,1) case.

import itertools
from math import factorial
from itertools import product

def get_comb(string):
    L = len(string_)
    combinisation = []

    for possibility in product([0,1], repeat=len(string_)-1):
        s = []
        indexes = [i for i in range(len(string_)-1) if list(possibility)[i]!=0]
        if sum(indexes) != 0:
            if sum(indexes) != len(string_)-1:
                for index in indexes:
                    s.append(string_[:index+1])
                s.append(string_[indexes[-1:][0]+1:])
                combinisation.append(s)
            else:
                combinisation.append(string_)
    return combinisation



string_ = '4824'
print "%s combinations:"%string_
print get_comb(string_)



string_ = '478952'
print "%s combinations:"%string_
print get_comb(string_)



string_ = '1234'
print "%s combinations:"%string_
print get_comb(string_)


>> 
4824 combinations:
[['482', '4'], ['48', '24'], '4824', ['4', '482', '4'], ['4', '48', '24'], '4824
']
478952 combinations:

[['47895', '2'], ['4789', '52'], ['4789', '47895', '2'], ['478', '952'], ['478',
 '47895', '2'], '478952', ['478', '4789', '47895', '2'], ['47', '8952'], '478952
', ['47', '4789', '52'], ['47', '4789', '47895', '2'], ['47', '478', '952'], ['4
7', '478', '47895', '2'], ['47', '478', '4789', '52'], ['47', '478', '4789', '47
895', '2'], ['4', '47895', '2'], ['4', '4789', '52'], ['4', '4789', '47895', '2'
], ['4', '478', '952'], ['4', '478', '47895', '2'], '478952', ['4', '478', '4789
', '47895', '2'], ['4', '47', '8952'], '478952', ['4', '47', '4789', '52'], ['4'
, '47', '4789', '47895', '2'], ['4', '47', '478', '952'], ['4', '47', '478', '47
895', '2'], ['4', '47', '478', '4789', '52'], ['4', '47', '478', '4789', '47895'
, '2']]

1234 combinations:

[['123', '4'], ['12', '34'], '1234', ['1', '123', '4'], ['1', '12', '34'], '1234
']
farhawa
  • 10,120
  • 16
  • 49
  • 91
  • There seem to be some duplicate characters in your combinations, e.g. `['4', '482', '4']`. Also, I'd suggest being more consistent with the data types in the list, i.e. use `['4824']` instead of `'4824'` – tobias_k May 22 '15 at 15:55
0

A normal code can be written like:

s=raw_input('enter the string:')
word=[]
for i in range(len(s)):
    for j in range(i,len(s)):
        word.append(s[i:j+1])

print word
print 'no of possible combinations:',len(word)

And output: enter the string: 4824 ['4', '48', '482', '4824', '8', '82', '824', '2', '24', '4'] no of possible combinations:10