5

Possible Duplicate:
Rolling or sliding window iterator in Python

I'm new to programming and am learning Python. I'm looking for an efficient/pythonic way to solve a problem.

I'd like a function that returns a list of iterables containing the combinations of a parent iterable as long as the elements in the combination appear the same same consecutive order as the original parent iterable.

I'm not sure if "consecutive" if the right word to describe this concept as 'consecutive' typically means, 'the same element repeated.' e.g. [1,1,1], 'aaa', etc...

I mean that given the list [1,2,3,4,5]:

[1,2,3] is consecutive but [1,2,4] is not. (Is there a word for this?)

Here's a function consecutive_combinations() I created and the expected behavior:

def consecutive_combinations(iterable, consec):
    begin = 0
    chunks = len(iterable) + 1 - consec
    return [iterable[x + begin: x + consec] for x in xrange(chunks)]

def test():
    t = (1,2,3,4,5)
    s = "The quick brown fox jumps over the lazy dog."
    CC = consecutive_combinations
    assert CC(t, 2) == [(1, 2), (2, 3), (3, 4), (4, 5)]
    assert CC(t, 3) == [(1, 2, 3), (2, 3, 4), (3, 4, 5)]
    assert CC(t, 4) == [(1, 2, 3, 4), (2, 3, 4, 5)]
    assert CC(t, 5) == [(1, 2, 3, 4, 5)]
    assert CC(s, 3) == ['The', 'he ', 'e q', ' qu', 'qui', 'uic', 'ick', 'ck ', 'k b', ' br', 'bro', 'row', 'own', 'wn ', 'n f', ' fo', 'fox', 'ox ', 'x j', '  ju', 'jum', 'ump', 'mps', 'ps ', 's o', ' ov', 'ove', 'ver', 'er ', 'r t', '    th', 'the', 'he ', 'e l', ' la', 'laz', 'azy', 'zy ', 'y d', ' do', 'dog', 'og. ']
    assert CC('', 3) == []
    print "All tests passed!"

test()

Is this an efficient solution? Is there something in itertools or some other pre-built module that would do this sort of thing?

Community
  • 1
  • 1
NTUI
  • 329
  • 4
  • 12
  • 1
    Looks fine to me. I don't think that combinations is the right word. What you are doing is returning all the _subsequences_ of a given length of a given sequence. – Joel Cornett Jun 10 '12 at 19:36
  • 1
    Ruby calls it `Enumerable#each_slice` – Niklas B. Jun 10 '12 at 19:38
  • 1
    @JoelCornett: Actually, these are *substrings*, not *subsequences*; subsequences are not necessarily contiguous. Note that strings has a broader meaning in theoretical computer science than in most programming languages. – Fred Foo Jun 10 '12 at 20:07
  • @larsmans: Good point. And for OP's reference, the wikipedia entry: http://en.wikipedia.org/wiki/Substring#Substring – Joel Cornett Jun 10 '12 at 20:09
  • substring! duh, not sure where the mental block was that I couldn't think of that. This is exactly what I meant. – NTUI Jun 10 '12 at 20:17
  • I think the term you're looking for is [n-gram](http://en.wikipedia.org/wiki/N-gram). – georg Jun 10 '12 at 21:42

2 Answers2

8

I like the pragmatic zip approach:

n = 3
s = "The quick brown fox jumps over the lazy dog."
zip(*(s[i:] for i in xrange(n)))

It's not super-efficient and it only works for sequences, but often enough it does the job.

The corresponding itertools solution is a pretty straightforward transformation of the above:

from itertools import izip, islice, tee

def slices(iterable, n):
    return izip(*(islice(it, i, None) for i, it in enumerate(tee(iterable, n))))

Soo many is...

Nevertheless, this one should work for any iterable (but may be slower for plain sequences like lists or strings).

Niklas B.
  • 92,950
  • 18
  • 194
  • 224
3

Your solution is fine. You could however shorten it a little bit. For example:

def subsequences(iterable, length):
    return [iterable[i: i + length] for i in xrange(len(iterable) - length + 1)]
Joel Cornett
  • 24,192
  • 9
  • 66
  • 88