Python, solver method or optimization of current code?

Question

I am trying to add more data to the matrices to analyze and solve for the , but as it stands currently it is performing the brute operation, it exceeds python's limits if I add another column to the analysis. Is there a solver method availalbe that would find a similar result rather than having to brute through combinations? The sample.csv is also listed below. Thanks for any advice.

import csv
import itertools as it
import numpy as np

C = 2618.08
B = 933.15
A = 932.37
adjust = 1


D = csv.reader(open('sample.csv'))

float_ABC = []
OUT = np.zeros((3, 9)) - 100

for row in D:
        float_ABC.append([str(x) for x in row])

float_ABC = float_ABC.astype(np.float)

Alpha=float_ABC[:, [0,3,6,9,12,15]]
Beta=float_ABC[:, [2,5,8,11,14,17]]
Phi=float_ABC[:, [1,4,7,10,13,16]]

plines1 = it.product(Alpha[0],Alpha[1],Alpha[2],Alpha[3],
                     Alpha[4],Alpha[5],Alpha[6],Alpha[7],
                     Alpha[8])

plines2 = it.product(Beta[0],Beta[1],Beta[2],Beta[3],
                     Beta[4],Beta[5],Beta[6],Beta[7],
                     Beta[8])

plines3 = it.product(Phi[0],Phi[1],Phi[2],Phi[3],
                     Phi[4],Phi[5],Phi[6],Phi[7],
                     Phi[8])


for count in range(0,6**9):
    sumA = next(plines1)
    sumB = next(plines2)
    sumC = next(plines3)

    if  (sum(sumC)+B)/(sum(sumA)+C) <= (B+adjust)/(C) and \
        (sum(sumC)+B)/(sum(sumA)+C) >= (B+adjust-10)/(C) and \
        (sum(sumB)+A)/(sum(sumA)+C) > (sum(OUT[2])+A)/(sum(OUT[0])+C):
        print("#",count,"- new option found!")
        OUT = np.vstack((sumA,sumC,sumB))

and sample.csv:

13.4,-18.81,-24.75,5.82,-8.21,-10.8,0,0,0,3.3,1.56,2.05,-2.1,5.36,7.05,2.6,5.65,7.44
0,-11.01,-14.49,0,-4.87,-6.41,0,0,0,0.6,2.24,2.95,1,4,5.26,1.7,2.73,3.59
0,-40.74,-53.6,0,-17.86,-23.5,0,0,0,3.5,6.53,8.59,2.9,9.36,12.31,1.9,2.61,3.44
1000,-1000,-1000,0,0,0,20.76,21.78,15.66,18.48,23.44,16.96,27.72,26.46,19.92,32.28,29.58,23.08
1000,-1000,-1000,-2.28,-6.12,-4.16,-2.28,-2.53,-1.73,0,0,0,1.92,-1.85,-1.26,1.08,-1.27,-0.86
1000,-1000,-1000,0,0,0,6.78,7.38,5.07,6.66,8.93,6.14,8.46,8.41,5.78,9.42,10.37,7.14
1000,-1000,-1000,0,0,0,28.8,34.28,27.86,37.2,39.64,33.32,45.6,42.76,36.63,54,45.88,40.03
1000,-1000,-1000,0,-4.95,-3.36,0,0,0,1.8,0.59,0.4,1.2,1.85,1.27,3.72,0.17,0.11
1000,-1000,-1000,0,0,0,27.6,19.3,13.71,32.76,23.68,17.15,37.8,20.56,14.71,22.56,27.58,21.06

what do you mean by "it exceeds python's limits"? What kind of error are you getting? I can suggest using `for count, (sumA,sumB,sumC) in enumerate(zip(plines1,plines2,plines3)):` and caching the result of `sum(sumA)+C` etc. since you recalculate it several times every iteration. — Tadhg McDonald-Jensen, Mar 02 '17 at 19:22
I would expect the above code to raise `AttributeError: 'list' object has no attribute 'astype'` on the line `float_ABC = float_ABC.astype(np.float)` since `float_ABC` is a list... — Tadhg McDonald-Jensen, Mar 02 '17 at 19:24
thanks @TadhgMcDonald-Jensen for the input - the line `for count, (sumA,sumB,sumC) in enumerate(zip(plines1,plines2,plines3)):` for some reason didn't connect in my brain and I had used that in other portions of my analysis - that does make sense. as to the float_ABC line, it had taken some of the entries as strings, and so it converts all to float objects - seems to work fine on my end. Much appreciated for the help on that though! — Dustin Whitehead, Mar 03 '17 at 05:51

score 0 · Accepted Answer · edited May 23 '17 at 11:53

This answer is treating the question more like codereview then helping with algorithm.

First you can iterate over all three plines1 at the same time using zip

for sumA, sumB, sumC in zip(plines1, plines2, plines3):
    pass

but then to get a running count of the step you are on you can use enumerate:

for count, (sumA, sumB, sumC) in enumerate(zip(plines1, plines2, plines3)):
    pass

I also notice you recalculate (B+adjust)/(C) and (B+adjust-10)/(C) every iteration where neither are changed in the loop at all, so calculating them once before the loop instead of every iteration will definitely save you some execution time:

high_check = (B+adjust)/(C)
low_check = (B+adjust-10)/(C)

for count, (sumA, sumB, sumC) in enumerate(zip(plines1, plines2, plines3)):

    if ( low_check <= (sum(sumC)+B)/(sum(sumA)+C) <= high_check
          and <OTHER CHECK> ):
        ...

as well calculating sum(sumA) (and for sumB, sumC) over and over again is unecessarily costly, and somewhat confusing since sumA represents a tuple of values, it would make more sense to calculate the sums once and take the tuple (sumA, sumB, sumC) as one value called matrix (2d tuple is close enough)

for count, matrix in enumerate(zip(plines1, plines2, plines3)):
    sumA, sumB, sumC = map(sum, matrix)
    if ( low_check <= (sumC+B)/(sumA+C) <= high_check
          and <OTHER CHECK> ):
        ...
        OUT = np.vstack(matrix)

similarly only recalculating (sum(OUT[2])+A)/(sum(OUT[0])+C) only when OUT changes will reduce the execution time needed to recalculate unchanging values:

OUT_check = (sum(OUT[2])+A)/(sum(OUT[0])+C)

for ... in ...:

    if (  ...
          and (sumB+A)/(sumA+C) > OUT_check):
        ...
        OUT_check = (sum(OUT[2])+A)/(sum(OUT[0])+C)

so section of altered code would look like this:

plines1 = it.product(*Alpha) #star notation just unpacks all the elements into arguments
plines2 = it.product(*Beta)
plines3 = it.product(*Phi)

high_check = (B+adjust)/(C)
low_check = (B+adjust-10)/(C)
OUT_check = (sum(OUT[2])+A)/(sum(OUT[0])+C)

for count, matrix in enumerate(zip(plines1, plines2, plines3)):
    sumA, sumB, sumC = map(sum, matrix)
    if ( low_check <= (sumC+B)/(sumA+C) <= high_check
          and (sumB+A)/(sumA+C) > OUT_check):
        print("#",count,"- new option found!")
        OUT = np.vstack(matrix)
        OUT_check = (sum(OUT[2])+A)/(sum(OUT[0])+C)

I have changed the code to enumerate and cached the portions I can, saving some time in calculations given the current algorithm. My thanks to the optimization of the current algorithm. I suppose what I'm asking for improving time further - is if there is a solver algorithm that can further improve time so that I may add more columns of data to analyze? My ultimate goal is to have the sample.csv to import 15 items into each tuple, rather than 9. Would I be able to achieve this with the help of an algorithm of 'scipy.optimize' before iterating through all of the plines? — Dustin Whitehead, Mar 03 '17 at 19:21
I haven't really used any of those sorts of functionalities so I won't be able to help on that front, sorry. — Tadhg McDonald-Jensen, Mar 04 '17 at 02:52

Python, solver method or optimization of current code?

1 Answers1