Optimizing a set of polynomials for computation speed

Question

I have a set of polynomial expressions produced by a computer algebra system (CAS). For example, this is one element of this set.

-d*d*l*l*q-b*b*l*l*q+2*d*f*j*l*q+2*b*f*h*l*q-f*f*j*j*q-b*b*j*j*q+2*b*d*h*j*q-f*f*h*h*q-d*d*h*h*q+b*b*j*j*o*o-2*b*d*h*j*o*o+d*d*h*h*o*o-2*b*b*j*l*n*o+2*b*d*h*l*n*o+2*b*f*h*j*n*o-2*d*f*h*h*n*o+2*b*d*j*l*m*o-2*d*d*h*l*m*o-2*b*f*j*j*m*o+2*d*f*h*j*m*o+b*b*l*l*n*n-2*b*f*h*l*n*n+f*f*h*h*n*n-2*b*d*l*l*m*n+2*b*f*j*l*m*n+2*d*f*h*l*m*n-2*f*f*h*j*m*n+d*d*l*l*m*m-2*d*f*j*l*m*m+f*f*j*j*m*m

I need to execute all of them in a C program, as fast as possible. If you look carefully at any of these formulas, it is obvious that we can optimize them for computation speed. For example, in the polynomial pasted above, I can see immediately the terms -d*d*l*l*q, 2*d*f*j*l*q, and -f*f*j*j*q, so that I could replace their sum by -q*square(d*l-f*j). I believe there are a lot of such things that can be done here. I don't believe (but perhaps am I wrong) that any compiler would be able to find this optimization, or perhaps more advanced ones. I tried to ask maxima (a CAS) to do this for me, but nothing came out (since I am a beginner with maxima, I have perhaps missed a magic command). So, my first question is: what tool/algorithm can we use to optimize a polynomial expression for computation speed?

When it comes to optimize a set of polynomial expressions sharing most of their variables, things become more complicated. Indeed, optimizing expression by expression can be suboptimal because common parts could be identified by the compiler before optimizing but not anymore after if this is not performed as a whole. So, my second question is: what tool/algorithms can we use to optimize a set of polynomial expressions for computation speed?

Best regards,

P.S. : this post shares some similarities with "computer algebra soft to minimize the number of operations in a set of polynomials", however the answers given in that one point to CAS programs instead of saying how we can use them to achieve our goal.

This seems to be such a basic and commonly occurring problem that I would be very surprised if most CASs wouldn't be able to do at least a partial simplification for you. — biziclop, Jan 30 '15 at 16:38
I rather agree with @biziclop and I note that OP writes *I am a beginner with maxima*. Perhaps a partial solution lies in acquiring greater familiarity with maxima. — High Performance Mark, Jan 30 '15 at 17:04
Apart from optimizing evaluation of a single polynomial, the opportunity to share common subexpressions across several polynomials in your set can be explored. However you only provided an example of one. — hardmath, Jan 30 '15 at 17:22
Thank you for your fast replies. I will add that I have, of course, read the list of available functionalities from http://maxima.sourceforge.net/docs/manual/maxima_14.html. When I say that “nothing came out”, I mean that I have try “factor”, “factorsum”, “factorout” (even if this cannot be considered as a automatic solution), and “combine”. My polynomials cannot be expressed as a product of simpler polynomials. But this does not mean that it is impossible to find a way to write them so that the computation would become faster. — Sébastien Piérard, Jan 30 '15 at 17:26
@biziclop, I do not expect to find a method “write_polynomial_in_the_best_way_for_a_computer” in a CAS since they are not concerned at all with the execution time (which I expect to be explored more by the persons concerned with compilers), and that would not help at all the math guys. I mentioned that I tried maxima to indicate that I would be also happy with a nearly optimal solution, even if I would prefer an algorithm providing the optimal one. — Sébastien Piérard, Jan 30 '15 at 17:29
@hardmath: that is true, but would it help to have a bunch of such ugly expressions in the post ? If so, I can give some others, but this is going to be only one particular case of a general question. I put one polynomial in order for a reader to be able to do a copy paste in a software he has, and to try his idea before answering. — Sébastien Piérard, Jan 30 '15 at 17:34
Is the problem always in the form of `sum(constant*a^x*b^y*c^z...)`? — biziclop, Jan 30 '15 at 18:21
@biziclop, yes. My polynomials can be of any degree (even if I would be happy with a solution for the polynomials of degree 2, as in my example) and of many variables(I had initially a problem with 17 variables, that's why I use letters a to q for my variable names). The form you are suggesting can always be obtained easily, for example using “expand” in maxima. So, this assumption is without loss of generality, and can of course be done. — Sébastien Piérard, Jan 30 '15 at 18:51
I would hope that two sample polynomials will give Readers a better idea about the range of common subexpressions. I can help with their formatting. — hardmath, Jan 30 '15 at 18:52
@hardmath: here is a second one from my set. -ccllq-aallq+2cejlq+2aehlq-eejjq-aajjq+2achjq-eehhq-cchhq+aajjoo-2achjoo+cchhoo-2aajlno+2achlno+2aehjno-2cehhno+2acjlmo-2cchlmo-2aejjmo+2cehjmo+aallnn-2aehlnn+eehhnn-2acllmn+2aejlmn+2cehlmn-2eehjmn+ccllmm-2cejlmm+eejjmm — Sébastien Piérard, Jan 30 '15 at 19:02
@hardmath: here is a third one from my set. -2cdllq-2abllq+2cfjlq+2dejlq+2afhlq+2behlq-2efjjq-2abjjq+2adhjq+2bchjq-2efhhq-2cdhhq+2abjjoo-2adhjoo-2bchjoo+2cdhhoo-4abjlno+2adhlno+2bchlno+2afhjno+2behjno-2cfhhno-2dehhno+2adjlmo+2bcjlmo-4cdhlmo-2afjjmo-2bejjmo+2cfhjmo+2dehjmo+2abllnn-2afhlnn-2behlnn+2efhhnn-2adllmn-2bcllmn+2afjlmn+2bejlmn+2cfhlmn+2dehlmn-4efhjmn+2cdllmm-2cfjlmm-2dejlmm+2efjjmm — Sébastien Piérard, Jan 30 '15 at 19:04
I think this problem might be NP-complete -- it's similar to the "minimum number of shifts-and-adds required to multiply by a constant" which, if I remember correctly from my college lectures, is NP-complete (but still tractable for small constants, so compilers still sometimes do an exhaustive test). So you might want to just write a greedy approach with backtracking, and if it doesn't work well then try to find additional constraints that let you terminate the test early. (I'm not sure what, but this does seem like a constraint satisfaction problem, so maybe you can use standard heuristics.) — user541686, Jan 30 '15 at 20:00
Depending on which on you use, the amount of optimization a compiler can do on something like this may surprise you. If you want to try your hand at writing a program that does the same kind of simplification that an advanced compiler would do up front, you should google `dags` and `common subexpressions`. — 500 - Internal Server Error, Jan 30 '15 at 20:02
@500 - Internal Server Error, I agree with you for the common subexpressions detection, but I would not expect compilers to do advanced things by default, and therefore much effort being put on the type of optimizations I am looking for. The reason is that they cannot modify the input program without giving the guarantee that it will do the same in all cases as the initial one. But, there can be overflows, underflows, and the precision of floating point numbers is limited. In fact, doing so would violates standards a languages like ISO C and C++ (see option -fassociative-math of gcc). — Sébastien Piérard, Feb 01 '15 at 00:06
@500 - Internal Server Error, that was already discussed, for example, in http://stackoverflow.com/questions/6430448/why-doesnt-gcc-optimize-aaaaaa-to-aaaaaa. So the conclusion is that the programer has the responsibility for this type of optimizations, but never the compiler. Therefore, programmers need a tool for that. — Sébastien Piérard, Feb 01 '15 at 00:11

biziclop · Answer 1 · 2015-01-30T19:54:23.753

0

As a first attempt I'd probably try the greedy approach.

So using your first example we start with this:

 -1*d*d*l*l*q
 -1*b*b*l*l*q
  2*d*f*j*l*q
  2*b*f*h*l*q
 -1*f*f*j*j*q
 ...

Now try to find the most repeated pattern in the terms. This is q, which luckily is present in all of them. Let's remove it and that leaves us with

 -1*d*d*l*l
 -1*b*b*l*l
  2*d*f*j*l
  2*b*f*h*l
 -1*f*f*j*j
 ...

Now do the same thing again, this time we get l and the problem splits in two subproblems.

 -1*d*d*l
 -1*b*b*l
  2*d*f*j
  2*b*f*h
  ---------
 -1*f*f*j*j
 ...

Repeat recursively until there's no repetition left and tracing your steps back you can recursively reconstruct a simplified version of the expression:

 q*(l*<first subproblem>+<second subproblem>)

As you can already see, the solution won't necessarily be optimal but it's easy to implement and may be good enough. If you need a better one then you probably need to explore more combinations and rank them according to the number of multiplications you save but the general concept is the same.

edited Jan 30 '15 at 19:54

answered Jan 30 '15 at 19:47

biziclop

48,926
12
77
104

I like your approach since it will work perfectly for one polynomial of one variable x. For example, a\*x\*x\*x+b\*x\*x+c\*x+d will produce x\*(x\*\*(x\*a)+b)+c)+d, which seems to be optimal. However, it is easy to find very simple examples in which your answer is not optimal. For example, a\*d+b\*d+c\*d+b\*x+c\*x will give (a+b+c)\*d+(b+c)\*x, but we can find a\*d+(b+c)\*(d+x) which takes one operation less. I expect (in fact, I guess) the non-optimality of your solution will become much more important with more terms and variables. – Sébastien Piérard Jan 31 '15 at 23:42
@S.Piérard I suspect that the problem in general is NP-hard so maybe you can't expect much from a greedy algorithm. But starting from the same approach you can extend it to generate many expression trees and with some clever pruning you can improve on it. An obvious enhancement for example is to try every permutation for a sub-problem below a certain size. Or to explore the 3 most frequently repeated terms rather than just 1. – biziclop Feb 01 '15 at 12:47
@S.Piérard Reading through the comments I realised that you're probably looking at floating point calculations (I assumed it was fixed point), in which case the cost of a `+` operation isn't negligible. That means I have to re-think the algorithm, which only aimed to minimise multiplications. – biziclop Feb 01 '15 at 13:28
My question was generic. Even if this falls out of the scope of the original question, I can tell you that I tried to see the speed of additions and multiplications with floating point numbers with a processor i7. The result of my experiment was that, as expected, additions are less costly than multiplications. Your approach is thus a good starting point, even for floating point arithmetic. – Sébastien Piérard Feb 01 '15 at 17:03

Optimizing a set of polynomials for computation speed

1 Answers1