0

Supposing I have a string with several space-separated words, like

words = "foo bar baz qux"

If I want a list of the words, I can just call words.split() and get

['foo','bar','baz','qux']

But if I want to get each word and each set of (adjacent) words, like

['foo bar baz qux', 'foo bar baz', 'bar baz qux', 
'foo bar', 'bar baz', 'baz qux', 'foo', 'bar',
'baz', 'qux']

How can I go about this? I'm sure I can write a big ugly function that takes a string like words and iterates over each set of adjacent elements to return the above, but I've a hunch there's a more elegant way to go about it. Is there?

jgysland
  • 345
  • 2
  • 10

3 Answers3

1

Pretty "ugly" and with itertools:

Combining "Find all consecutive sub-sequences of length n in a sequence" and "concatenating sublists python":

from itertools import chain

words = "foo bar baz qux"

w = words.split()
print map(' '.join, chain.from_iterable(zip(*(w[i:] for i in range(i))) for i in range(1, len(w) + 1)))

Output:

['foo', 'bar', 'baz', 'qux', 'foo bar', 'bar baz', 'baz qux', 'foo bar baz', 'bar baz qux', 'foo bar baz qux']

Not so ugly and pure Python:

I found a pretty short solution - although it has two nested for-loops.

print [' '.join(w[i:j+1]) for i in range(len(w)) for j in range(i, len(w))]

Output:

['foo', 'foo bar', 'foo bar baz', 'foo bar baz qux', 'bar', 'bar baz', 'bar baz qux', 'baz', 'baz qux', 'qux']
Community
  • 1
  • 1
Falko
  • 17,076
  • 13
  • 60
  • 105
  • I actually like the pure Python route best, which I did not expect. And two for loops isn't going to be very problematic for my use case. – jgysland Mar 08 '15 at 22:46
0

You could use the nltk library - which is for natural language processing. e.g.

from nltk.util import ngrams
sentence = 'foo bar baz qux'

adj = [3, 2, 1]
for n in adj:
    print ngrams(sentence.split(), n) 
wrdeman
  • 810
  • 10
  • 23
  • I've been looking for a reason to dig into nltk, but this (and a couple variations I tried) doesn't produce the desired result. :-( – jgysland Mar 08 '15 at 22:44
0

The first principles approach (i.e., without needing to import anything) is indeed "ugly", but not too "big", really...

list = ['foo','bar','baz','qux']
length = len(list)
newlist = []
for item in list:
    string = item
    newlist.append(item)
    # assuming we're not on the last element, there's more strings to add starting with this
    startfrom = list.index(item) + 1
    for i in range(startfrom, length):
        string = string + ' ' + list[i]
        newlist.append(string)

print newlist

Result

['foo', 'foo bar', 'foo bar baz', 'foo bar baz qux', 'bar', 'bar baz', 'bar baz qux', 'baz', 'baz qux', 'qux']
Mark
  • 1,285
  • 1
  • 19
  • 28