Read all possible sequential substrings in Python

Question

If I have a list of letters, such as:
word = ['W','I','N','E']
and need to get every possible sequence of substrings, of length 3 or less, e.g.:
W I N E, WI N E, WI NE, W IN E, WIN E etc.
What is the most efficient way to go about this?

Right now, I have:

word = ['W','I','N','E']
for idx,phon in enumerate(word):
    phon_seq = ""
    for p_len in range(3):
        if idx-p_len >= 0:
            phon_seq = " ".join(word[idx-(p_len):idx+1])
            print(phon_seq)

This just gives me the below, rather than the sub-sequences:

W
I
W I
N
I N
W I N
E
N E
I N E

I just can't figure out how to create every possible sequence.

Isn't what you are looking for just "WINE" with every possible placement of spaces inside it? — Stuart, Nov 06 '14 at 23:43
@Reut - the output would be long. There are some examples in the question — Adam_G, Nov 06 '14 at 23:48
IIUC, you want something like [this](http://stackoverflow.com/questions/20754857/separating-a-string/20754984#20754984) but with at least one split, so you don't get `"WINE"` as an output. Is that correct? — DSM, Nov 06 '14 at 23:54
@DSM - I didn't see that post in my searching, but it looks exactly right, also. — Adam_G, Nov 06 '14 at 23:58

irrelephant · Accepted Answer · 2014-11-07T00:13:52.200

3

Try this recursive algorithm:

def segment(word):
  def sub(w):
    if len(w) == 0:
      yield []
    for i in xrange(1, min(4, len(w) + 1)):
      for s in sub(w[i:]):
        yield [''.join(w[:i])] + s
  return list(sub(word))

# And if you want a list of strings:
def str_segment(word):
  return [' '.join(w) for w in segment(word)]

Output:

>>> segment(word)
[['W', 'I', 'N', 'E'], ['W', 'I', 'NE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'N', 'E'], ['WI', 'NE'], ['WIN', 'E']]

>>> str_segment(word)
['W I N E', 'W I NE', 'W IN E', 'W INE', 'WI N E', 'WI NE', 'WIN E']

edited Nov 07 '14 at 00:13

answered Nov 06 '14 at 23:48

irrelephant

4,091
2
25
41

1

@Adam_G I improved the answer -- you should use this version instead! – irrelephant Nov 07 '14 at 00:16
The previous version made unnecessary recursive calls to `segment` instead of `sub` -- pretty much a typo. – irrelephant Nov 07 '14 at 02:04

Stuart · Answer 2 · 2014-11-07T11:09:08.160

2

As there can either be a space or not in each of three positions (after W, after I and after N), you can think of this as similar to bits being 1 or 0 in a binary representation of a number ranging from 1 to 2^3 - 1.

input_word = "WINE"
for variation_number in xrange(1, 2 ** (len(input_word) - 1)):  
    output = ''
    for position, letter in enumerate(input_word):
        output += letter
        if variation_number >> position & 1:
            output += ' '
    print output

Edit: To include only variations with sequences of 3 characters or less (in the general case where input_word may be longer than 4 characters), we can exclude cases where the binary representation contains 3 zeroes in a row. (We also start the range from a higher number in order to exclude the cases which would have 000 at the beginning.)

for variation_number in xrange(2 ** (len(input_word) - 4), 2 ** (len(input_word) - 1)):  
    if not '000' in bin(variation_number):
        output = ''
        for position, letter in enumerate(input_word):
            output += letter
            if variation_number >> position & 1:
                output += ' '
        print output

edited Nov 07 '14 at 11:09

answered Nov 07 '14 at 00:13

Stuart

9,597
1
21
30

Unfortunately, this algorithm doesn't generalize to longer input words as it prints substrings of length 4 and above. Try it with `input_word = "SWINE"` to see what I mean. – irrelephant Nov 07 '14 at 00:31
@irrelephant it works for me with SWINE. Not sure what you mean about substring length. – Stuart Nov 07 '14 at 00:37
When I try it, I get the line `SWIN E` -- the SWIN isn't length 3 or less. Part of the OP's problem restricts the substrings to a max length of 3. – irrelephant Nov 07 '14 at 00:38
The comment to my post makes me think he didn't really mean the code should limit the sub-string's length. This is my favourite solution as it abstracts the problem nicely. – Reut Sharabani Nov 07 '14 at 00:40
@irrelephant ok I see what you mean. It generalises to list all substrings of input_word, regardless of length, as per my exchange with the OP in the comments. – Stuart Nov 07 '14 at 00:42
I very intentionally limited the sub-string length – Adam_G Nov 07 '14 at 01:13

Reut Sharabani · Answer 3 · 2014-11-07T00:23:11.430

My implementation for this problem.

#!/usr/bin/env python

# this is a problem of fitting partitions in the word
# we'll use itertools to generate these partitions
import itertools

word = 'WINE'

# this loop generates all possible partitions COUNTS (up to word length)
for partitions_count in range(1, len(word)+1):
    # this loop generates all possible combinations based on count
    for partitions in itertools.combinations(range(1, len(word)), r=partitions_count):

        # because of the way python splits words, we only care about the
        # difference *between* partitions, and not their distance from the
        # word's beginning
        diffs = list(partitions)
        for i in xrange(len(partitions)-1):
            diffs[i+1] -= partitions[i]

        # first, the whole word is up for taking by partitions
        splits = [word]

        # partition the word's remainder (what was not already "taken")
        # with each partition
        for p in diffs:
            remainder = splits.pop()
            splits.append(remainder[:p])
            splits.append(remainder[p:])

        # print the result
        print splits

I'm not sure this is correct. This gives permutations, but I'm looking for all substrings. In other words, every possible way to chop up the word. — Adam_G, Nov 06 '14 at 23:39
This does give all the substrings and returns the same results as my answer (except as a list instead of a string). — Stuart, Nov 07 '14 at 01:20

Mazdak · Answer 4 · 2014-11-07T00:32:14.003

As an alternative answer , you can do it with itertools module and use groupby function for grouping your list and also i use combination to create a list of pair index for grouping key : (i<=word.index(x)<=j) and at last use set for get a unique list .

Also note that you can got a unique combination of pair index at first by this method that when you have pairs like (i1,j1) and (i2,j2) if i1==0 and j2==3 and j1==i2 like (0,2) and (2,3) it mean that those slices result are same you need to remove one of them.

All in one list comprehension :

subs=[[''.join(i) for i in j] for j in [[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in list(combinations(range(len(word)),2))]]
set([' '.join(j) for j in subs]) # set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE'])

Demo in details :

>>> cl=list(combinations(range(len(word)),2))
>>> cl
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]

>>> new_l=[[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in cl]
>>> new_l
[[['W', 'I'], ['N', 'E']], [['W', 'I', 'N'], ['E']], [['W', 'I', 'N', 'E']], [['W'], ['I', 'N'], ['E']], [['W'], ['I', 'N', 'E']], [['W', 'I'], ['N', 'E']]]
>>> last=[[''.join(i) for i in j] for j in new_l]
>>> last
[['WI', 'NE'], ['WIN', 'E'], ['WINE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'NE']]
>>> set([' '.join(j) for j in last])
set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE'])
>>> for i in set([' '.join(j) for j in last]):
...  print i
... 
WIN E
W IN E
W INE
WI NE
WINE
>>>

This is not working - there should be 7 possible output combinations. — Stuart, Nov 07 '14 at 00:46

score 0 · Answer 5 · answered Sep 09 '17 at 17:03

0

i think it can be like this: word = "ABCDE" myList = []

for i in range(1, len(word)+1,1):
    myList.append(word[:i])

    for j in range(len(word[len(word[1:]):]), len(word)-len(word[i:]),1):
        myList.append(word[j:i])

print(myList)
print(sorted(set(myList), key=myList.index))
return myList

answered Sep 09 '17 at 17:03

Wang Guanqingwang

1

Formatting suggestion: I'm guessing `word` and `myList` are supposed to be in the code block. Also, could you explain what this answer gives that is not found in the other responses? – stealththeninja Sep 09 '17 at 17:22

Read all possible sequential substrings in Python

5 Answers5

Linked