3

I'm working with python 3. The function I'm working with is as follows:

def sub_combinations(segment):
  if len(segment) == 1:
    yield (segment,)
  else:
    for j in sub_combinations(segment[1:]):
      yield ((segment[0],),)+j
      for k in range(len(j)):
         yield (((segment[0],)+j[k]),) + (j[:k]) +(j[k+1:]) 

its a version of this function:

Yielding sub combinations

the output is as follows for (1,2,3,4,5):

((1,), (2,), (3,), (4,), (5,))
((1, 2), (3,), (4,), (5,))
((1, 3), (2,), (4,), (5,))
((1, 4), (2,), (3,), (5,)) *
((1, 5), (2,), (3,), (4,)) *
((1,), (2, 3), (4,), (5,))
((1, 2, 3), (4,), (5,))
((1, 4), (2, 3), (5,)) *
((1, 5), (2, 3), (4,)) *
((1,), (2, 4), (3,), (5,))
((1, 2, 4), (3,), (5,))
((1, 3), (2, 4), (5,))
((1, 5), (2, 4), (3,)) *
((1,), (2, 5), (3,), (4,)) *
((1, 2, 5), (3,), (4,)) *
((1, 3), (2, 5), (4,)) *
((1, 4), (2, 5), (3,)) *
((1,), (2,), (3, 4), (5,))
((1, 2), (3, 4), (5,))
((1, 3, 4), (2,), (5,))
((1, 5), (2,), (3, 4)) *
((1,), (2, 3, 4), (5,))
((1, 2, 3, 4), (5,))
((1, 5), (2, 3, 4)) *
((1,), (2, 5), (3, 4)) *
((1, 2, 5), (3, 4)) *
((1, 3, 4), (2, 5)) *
((1,), (2,), (3, 5), (4,))
((1, 2), (3, 5), (4,))
((1, 3, 5), (2,), (4,))
((1, 4), (2,), (3, 5)) *
((1,), (2, 3, 5), (4,))
((1, 2, 3, 5), (4,))
((1, 4), (2, 3, 5)) *
((1,), (2, 4), (3, 5))
((1, 2, 4), (3, 5))
((1, 3, 5), (2, 4))
((1,), (2,), (3,), (4, 5))
((1, 2), (3,), (4, 5))
((1, 3), (2,), (4, 5))
((1, 4, 5), (2,), (3,)) *
((1,), (2, 3), (4, 5))
((1, 2, 3), (4, 5))
((1, 4, 5), (2, 3)) *
((1,), (2, 4, 5), (3,))
((1, 2, 4, 5), (3,))
((1, 3), (2, 4, 5))
((1,), (2,), (3, 4, 5))
((1, 2), (3, 4, 5))
((1, 3, 4, 5), (2,))
((1,), (2, 3, 4, 5))
((1, 2, 3, 4, 5),)

The problem is that if I work with larger tuples, the function sub_combinations returns a huge amount of data and takes too long to compute it. To address this, I want to limit the amount of data returned by adding an extra argument. For example, sub_combinations((1,2,3,4,5), 2) should return the data above but without the tuples marked with a star. These are dropped because the offset between consequtive values in the tuple is greater than 2. For example, rows containing (1, 4), (1, 5) or (2, 5) and the likes of (1, 2, 5) etc, are dropped.

The line

for k in range(len(j))

needs to be adjusted to drop these lines, but I haven't figured out yet how. Any suggestions?

Barry

Community
  • 1
  • 1
Baz
  • 12,713
  • 38
  • 145
  • 268

1 Answers1

1

I think the following change results in the output you are looking for:

def sub_combinations(segment, max_offset=None):
   data = tuple([e] for e in segment)
   def _sub_combinations(segment):
      if len(segment) == 1:
         yield (segment,)
      else:
         for j in _sub_combinations(segment[1:]):
            yield ((segment[0],),)+j
            for k in range(len(j)):
               if max_offset and data.index(j[k][0]) - data.index(segment[0]) > max_offset:
                  break
               yield (((segment[0],)+j[k]),) + (j[:k]) +(j[k+1:])
   for combination in _sub_combinations(data):
      yield tuple(tuple(e[0] for e in t) for t in combination)

The idea here is that you break out of the k loop instead of yielding a tuple that would have an offset larger than max_offset.

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • (1,2,3,4,5) was just an example. ("H","E","L","P") is also a valid argument for this function, which wont work with your code. – Baz Dec 29 '11 at 09:54
  • Just treat each letter as the decimal form of its ASCII code (after normalizing all letters to be either upper or lowercase), and the code will still work. – ankit Dec 29 '11 at 18:08
  • @Baz - It should now work with arbitrary input, but it is also a bit more complex. – Andrew Clark Dec 29 '11 at 18:08