Python - most efficient way to generate combinations of large sets subject to criteria?

Question

I am trying to generate all possible combinations of financial instruments within a portfolio subject to a boundary condition.

Eg, suppose I have a collection of lists which represent allocations to a portfolio subject to a minimum and maximum percentage of the total portfolio size for each instrument:

"US Bonds" = {0.10,0.15,0.20,0.25,0.30}
"US Equities" = {0.25, 0.30, 0.35, 0.40, 0.45, 0.50}
"European Bonds" = {0.10, 0.15, 0.20}
"European Equities = {0.20,0.25,0.30,0.35,0.40,0.45,0.50}
 ...
"Cash" = {0.0, 0.05, 0.10, 0.15,...0.95}

My list, asset thus looks like:

[In]
Asset

[Out]
[[0.1, 0.15, 0.2, 0.25, 0.30],
[0.25, 0.30,0.35, 0.40, 0.45, 0.50],
[0.1, 0.15, 0.2],
[0.20, 0.25, 0.30,0.35, 0.40, 0.45, 0.50]
...
[0.0, 0.05, 0.1, 0.15, 0.2, 0.25,...0.95]]

What is the most efficient way to generate all possible portfolios subject to the criteria that the sum of every combination of instruments must be = 1?

For now, I am creating a list 'portfolios' as follows:

portfolios  = [item for item in itertools.product(*asset) if  np.isclose(sum(item),1)]

(nb, 'np.isclose' to take care of funky fp arithmetic).

I have represented the asset classes and possible allocations as a collection of lists but wonder if there is a different data representation (eg, NumPY arrays) which will be faster.

There are some questions regarding best execution for various combinations, but I have not seen any with with any boundary conditions.

Related: http://stackoverflow.com/questions/104420/how-to-generate-all-permutations-of-a-list-in-python — ivan_pozdeev, Nov 14 '15 at 03:33
Related: http://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays — ivan_pozdeev, Nov 14 '15 at 03:34
@ivan_pozdeev - thanks, but this doesn't address the boundary condition that only combinations which sum to 1 are relevant and the list structure, which is a product of different minimums and maximums. — GPB, Nov 14 '15 at 03:45

ErikR · Accepted Answer · 2015-11-14T14:17:33.420

(Note: code available at: http://lpaste.net/145213)

First of all I would represent the percentages as integer values to avoid floating point roundoff errors.

Secondly, the most efficient method will use bounding to avoid looking at portfolios which cannot possibly satisfy the == 1 constraint.

The loop you want to write will operate like this:

def portfolios():
  for us_bonds in [ 10, 15, 20, 25, 30 ]:
    if us_bonds > 100: break
    for us_equaties in [ 25, 30, 35, 40, 45, 50 ]:
      if us_bonds + us_equaties > 100: break
      for euro_bonds in [ 10, 15, 20 ]:
        if us_bonds + us_equaties + euro_bonds > 100: break
        for euro_equaties in [ 20, 25, 30, 35, 40, 45, 50 ]:
          if us_bonds + us_equaties + euro_bonds + euro_equaties > 100: break
          cash = 100 - (us_bonds + us_equaties + euro_bonds + euro_equaties)
          yield [us_bonds, us_equaties, euro_bonds, euro_equaties, cash]

This defines a generator which you can use in a for loop like this:

for x in portfolios(): print x

This approach is efficient because it avoids constructing portfolios which would exceed the == 100 constraint.

Note, also, that we've taken advantage of the fact that the "Cash" percentage basically can be anything - so it just takes up the difference between 100 percent and the total of the other investment categories.

The following function generalizes this loop for an arbitrary number of investment categories:

def gen_portfolio(categories):
  n = len(categories)
  tarr = [0] * (n+1)
  parr = [0] * (n+1)
  karr = [0] * (n+1)
  marr = [ len(c) for c in categories ]
  i = 0
  while True:
    while True:
      if i < n:
        p = categories[i][ karr[i] ]
        t = tarr[i] + p
        if t <= 100:
          parr[i] = p
          tarr[i+1] = t
          i += 1
          karr[i] = 0
          continue
        else:
          break                   # backup
      else:
        parr[n] = 100 - tarr[n]   # set the Cash percentage
        yield parr[:]             # yield a copy of the array parr
        break
    # backup
    while True:
      if i > 0:
        i -= 1
        karr[i] += 1
        if karr[i] < marr[i]: break
      else:
        return  # done!

def portfolios2():
  cats = [ [ 10, 15, 20, 25, 30 ], [ 25, 30, 35, 40, 45, 50 ], [ 10, 15, 20 ], [ 20, 25, 30, 35, 40, 45, 50 ] ]
  return gen_portfolio(cats)

And here is a test to show that they generate the same results:

def compareTest():
  ports1 = [ x for x in portfolios() ]
  ports2 = [ x for x in portfolios2() ]
  print "ports1 length:", len(ports1)
  print "ports2 length:", len(ports2)
  for x in ports1:
    if x not in ports2: print "not in ports2:", x
  for x in ports2:
    if x not in ports1: print "not in ports1:", x

Update

Here is an example which demonstrates the difference between this method and itertools.product.

Suppose there are 10 investment categories and the percentages are [90,91,..,99] for each category. The nested loops with the break statements will proceed as follows:

start the loop: for p1 in [90,91,..,99]

  set p1 = 90
  p1 < 100 so continue

  start the loop: for p2 in [90,91,..,99]
    set p2 = 90
    p1 + p2 > 100, so break out of the p2 loop

  set p1 = 91

  p1 < 100 so continue
  start the loop: for p2 in [90,91,..,99]
    set p2 = 90
    p1 + p2 > 100, so break out of the p2 loop
  set p1 = 92
  ...

So the nested loops with break statements only looks at 10 cases - p1 = 90, 91, .., 99 and p2 = 90. p2 never gets any bigger than 90 and it never tries to assign anything to p3,p4, ..., p10.

On the other hand, itertools.product will generate all 100 cases and then you have to filter out those combinations whose sum is > 100.

For some inputs itertools.product may be faster since it is written in C, but it doesn't do any pruning of cases based on the sum of the current selections.

Thanks, but I don't see how this can be any faster than by using the itertools object product. Aren't you checking every combination to see if the sum is greater than 1? — GPB, Nov 14 '15 at 13:11
No, because we are short-circuiting the nested loops with the break statements. Once any partial sum is > 1 we ignore any portfolio with that initial sequence. — ErikR, Nov 14 '15 at 13:14
Thanks! Wonder if arrays would be quicker than lists.. I'm not an algo guy, but this is screaming for matrix operations. — GPB, Nov 14 '15 at 15:35
If you've found this answer helpful consider upvoting and/or accepting it. — ErikR, Nov 14 '15 at 15:51
Upvoted for sure....holding out hope for a dialogue on how to use matrices to solve.... — GPB, Nov 14 '15 at 18:58
I doubt you will find a way to use matrices to solve this particular problem. However, it appears you want the possible portfolios to maximize some function of the portfolio. And it also appears that you don't care what each investment percentage is as long as it lies within certain specified ranges. If the function you want to maximize is linear, you should look at [__linear programming__](https://en.wikipedia.org/wiki/Linear_programming). Or perhaps some [__non-linear programming__](https://en.wikipedia.org/wiki/Nonlinear_programming) techniques are applicable. — ErikR, Nov 14 '15 at 23:26

Python - most efficient way to generate combinations of large sets subject to criteria?

1 Answers1