8

I am trying to explore the functionality of Python's built-in functions. I'm currently trying to work up something that takes a string such as:

'the fast dog'

and break the string down into all possible ordered phrases, as lists. The example above would output as the following:

[['the', 'fast dog'], ['the fast', 'dog'], ['the', 'fast', 'dog']]

The key thing is that the original ordering of the words in the string needs to be preserved when generating the possible phrases.

I've been able to get a function to work that can do this, but it is fairly cumbersome and ugly. However, I was wondering if some of the built-in functionality in Python might be of use. I was thinking that it might be possible to split the string at various white spaces, and then apply that recursively to each split. Might anyone have some suggestions?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 2
    Your best bet is splitting into a list, and then finding some function that would take the list and produce a list of lists along the lines you need. It's a list issue, not a string or split issue. – Jiminion Aug 23 '13 at 15:43
  • Also you might want to clarify what is a "phrase"; from your example it seems a phrase is any two words. – Burhan Khalid Aug 23 '13 at 15:49
  • I think what he actually tries to achieve is all possible single and multi-splits (keeping order). – drahnr Aug 23 '13 at 15:51
  • what is an ordered phrase? are you really asking "create a combination of all possible words in a phrase" – Golden Lion Jun 15 '21 at 17:35

4 Answers4

11

Using itertools.combinations:

import itertools

def break_down(text):
    words = text.split()
    ns = range(1, len(words)) # n = 1..(n-1)
    for n in ns: # split into 2, 3, 4, ..., n parts.
        for idxs in itertools.combinations(ns, n):
            yield [' '.join(words[i:j]) for i, j in zip((0,) + idxs, idxs + (None,))]

Example:

>>> for x in break_down('the fast dog'):
...     print(x)
...
['the', 'fast dog']
['the fast', 'dog']
['the', 'fast', 'dog']

>>> for x in break_down('the really fast dog'):
...     print(x)
...
['the', 'really fast dog']
['the really', 'fast dog']
['the really fast', 'dog']
['the', 'really', 'fast dog']
['the', 'really fast', 'dog']
['the really', 'fast', 'dog']
['the', 'really', 'fast', 'dog']
falsetru
  • 357,413
  • 63
  • 732
  • 636
4

Think of the set of gaps between words. Every subset of this set corresponds to a set of split points and finally to the split of the phrase:

the fast dog jumps
   ^1   ^2  ^3     - these are split points

For example, the subset {1,3} corresponds to the split {"the", "fast dog", "jumps"}

Subsets can be enumerated as binary numbers from 1 to 2^(L-1)-1 where L is number of words

001 -> the fast dog, jumps
010 -> the fast, dog jumps
011 -> the fast, dog, jumps
etc.
Maxim Razin
  • 9,114
  • 7
  • 34
  • 33
3

I'll elaborate a bit on @grep's solution, while using only built-ins as you stated in your question and avoiding recursion. You could possibly implement his answer somehow along these lines:

#! /usr/bin/python3

def partition (phrase):
    words = phrase.split () #split your phrase into words
    gaps = len (words) - 1 #one gap less than words (fencepost problem)
    for i in range (1 << gaps): #the 2^n possible partitions
        r = words [:1] #The result starts with the first word
        for word in words [1:]:
            if i & 1: r.append (word) #If "1" split at the gap
            else: r [-1] += ' ' + word #If "0", don't split at the gap
            i >>= 1 #Next 0 or 1 indicating split or don't split
        yield r #cough up r

for part in partition ('The really fast dog.'):
    print (part)
Hyperboreus
  • 31,997
  • 9
  • 47
  • 87
1

The operation you request is usually called a “partition”, and it can be done over any kind of list. So, let's implement partitioning of any list:

def partition(lst):
    for i in xrange(1, len(lst)):
        for r in partition(lst[i:]):
            yield [lst[:i]] + r
    yield [lst]

Note that there will be lots of partitions for longer lists, so it is preferable to implement it as a generator. To check if it works, try:

print list(partition([1, 2, 3]))

Now, you want to partition a string using words as elements. The easiest way to do this operation is to split text by words, run the original partitioning algorithm, and merge groups of words back into strings:

def word_partition(text):
    for p in partition(text.split()):
        yield [' '.join(group) for group in p]

Again, to test it, use:

print list(word_partition('the fast dog'))
liori
  • 40,917
  • 13
  • 78
  • 105