Can brute force algorithms scale?

Question

I have a math problem that I solve by trial and error (I think this is called brute force), and the program works fine when there are a few options, but as I add more variables/data it takes longer and longer to run.

My problem is although, the prototype works, it is useful with thousands of variables and large data sets; so, I'm wondering if it is possible to scale brute force algorithms. How can I approach scaling it?

I was starting to learn and play around with Hadoop (and HBase); although it looks promising, I wanted to verify that what I'm trying to do isn't impossible.

If it helps, I wrote the program in Java (and can use it if possible), but ended up porting it to Python, because I feel more comfortable with it.

Update: To provide more insight, I think I'll add a simplified version of the code to get the idea. Basically if I know the sum is 100, I am trying to find all combinations of the variables that could equal it. This is simple, in my version I may use larger numbers and many more variables. It's the Diophantine, and I believe there is no algorithm that exists to solve it without brute force.

int sum = 100;
int a1 = 20;
int a2 = 5;
int a3 = 10;
for (int i = 0; i * a1 <= sum; i++) {
    for (int j = 0; i * a1 + j * a2 <= sum; j++) {
        for (int k = 0; i * a1 + j * a2 + k * a3 <= sum; k++) {
            if (i * a1 + j * a2 + k * a3 == sum) {
              System.out.println(i + "," + j + "," + k);
            }
        }
    }   
}

I am new to programming, and I am sorry if I'm not framing this question correctly. This is more of a general question.

brute force algorithms typically scales very poorly, which problem are you brute forcing? — bjarneh, Sep 01 '11 at 02:36
Hi bjarneh..I am trying to do it to the Diophantine equation. I added some code to the question. I don't think an algo exists to solve it. — Lostsoul, Sep 01 '11 at 02:43

score 27 · Accepted Answer · edited May 23 '17 at 11:51

Typically, you can quantify how well an algorithm will scale by using big-O notation to analyze its growth rate. When you say that your algorithm works by "brute force," it's unclear to what extent it will scale. If your "brute force" solution works by listing all possible subsets or combinations of a set of data, then it almost certainly will not scale (it will have asymptotic complexity O(2ⁿ) or O(n!), respectively). If your brute force solution works by finding all pairs of elements and checking each, it may scale reasonably well (O(n²)). Without more information about how your algorithm works, though, it's difficult to say.

You may want to look at this excellent post about big-O as a starting point for how to reason about the long-term scalablility of your program. Typically speaking, anything that has growth rate O(n log n), O(n), O(log n), or O(1) scale extremely well, anything with growth rate O(n²) or O(n³) will scale up to a point, and anything with growth rate O(2ⁿ) or higher will not scale at all.

Another option would be to look up the problem you're trying to solve to see how well-studied it is. Some problems are known to have great solutions, and if yours is one of them it might be worth seeing what others have come up with. Perhaps there is a very clean, non-brute-force solution that scales really well! Some other problems are conjectured to have no scalable algorithms at all (the so-called NP-hard problems). If that's the case, then you should be pretty confident that there's no way to get a scalable approach.

And finally, you can always ask a new question here at Stack Overflow describing what you're trying to do and asking for input. Maybe the community can help you solve your problem more efficiently than you initially expected!

EDIT: Given the description of the problem that you are trying to solve, right now you are doing one for loop per variable from 0 up to the number you're trying to target. The complexity of this algorithm is O(U^k), where k is the number of variables and U is the sum. This approach will not scale very well at all. Introducing each new variable in the above case will make the algori2thm run 100 times slower, which definitely will not scale very well if you want 100 variables!

However, I think that there is a fairly good algorithm whose runtime is O(U²k) that uses O(Uk) memory to solve the problem. The intuition is as follows: Suppose that we want to sum up 1, 2, and 4 to get 10. There are many ways to do this:

2 * 4 +  1 * 2 +  0 * 1
2 * 4 +  0 * 2 +  2 * 1
1 * 4 +  3 * 2 +  0 * 1
1 * 4 +  2 * 2 +  2 * 1
1 * 4 +  1 * 2 +  4 * 1
1 * 4 +  0 * 2 +  6 * 1
0 * 4 +  5 * 2 +  0 * 1
0 * 4 +  4 * 2 +  2 * 1
0 * 4 +  3 * 2 +  4 * 1
0 * 4 +  2 * 2 +  6 * 1
0 * 4 +  1 * 2 +  8 * 1
0 * 4 +  0 * 2 + 10 * 1

The key observation is that we can write all of these out as sums, but more importantly, as sums where each term in the sum is no greater than the previous term:

2 * 4 +  1 * 2 +  0 * 1 = 4 + 4 + 2
2 * 4 +  0 * 2 +  2 * 1 = 4 + 4 + 1 + 1
1 * 4 +  3 * 2 +  0 * 1 = 4 + 2 + 2 + 2
1 * 4 +  2 * 2 +  2 * 1 = 4 + 2 + 2 + 1 + 1
1 * 4 +  1 * 2 +  4 * 1 = 4 + 2 + 1 + 1 + 1 + 1
1 * 4 +  0 * 2 +  6 * 1 = 4 + 1 + 1 + 1 + 1 + 1 + 1
0 * 4 +  5 * 2 +  0 * 1 = 2 + 2 + 2 + 2 + 2
0 * 4 +  4 * 2 +  2 * 1 = 2 + 2 + 2 + 2 + 1 + 1
0 * 4 +  3 * 2 +  4 * 1 = 2 + 2 + 2 + 1 + 1 + 1 + 1
0 * 4 +  2 * 2 +  6 * 1 = 2 + 2 + 1 + 1 + 1 + 1 + 1 + 1
0 * 4 +  1 * 2 +  8 * 1 = 2 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
0 * 4 +  0 * 2 + 10 * 1 = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1

So this gives an interesting idea about how to generate all possible ways to sum up to the target. The idea is to fix the first coefficient, then generate all possible ways to make the rest of the sum work out. In other words, we can think about the problem recursively. If we list the variables in order as x₁, x₂, ..., x_n, then we can try fixing some particular coefficient for x₁, then solving the problem of summing up sum - c_1 x_1 using just x₂, ..., x_n.

So far this doesn't seem all that fancy - in fact, it's precisely what you're doing above - but there is one trick we can use. As long as we're going to be thinking about this problem recursively, let's think about the problem in the opposite manner. Rather than starting with sum and trying to break it down, what if instead we started with 0 and tried to build up everything that we could?

Here's the idea. Suppose that we already know in advance all the numbers we can make using just sums of x₁. Then for every number k between 0 and sum, inclusive, we can make k out of x₂ and x₁ out of any combination where k - c₂ x₂ is something that can be made out of combinations of x₁. But since we've precomputed this, we can just iterate up over all possible legal values of c₂, compute k - c₂ x₂, and see if we know how to make it. Assuming we store a giant U x (k + 1) table of boolean values such that table entry [x, y] stores "can we sum up the first y values, inclusive, in a way that sums up to precisely U?," we can fill in the table efficiently. This is called dynamic programming and is a powerful algorithmic tool.

More concretely, here's how this might work. Given k variables, create a U x (k + 1) table T of values. Then, set T[0][0] = true and T[x][0] = false for all x > 0. The rationale here is that T[0][0] means "can we get the sum zero using a linear combination of the first zero variables?" and the answer is definitely yes (the empty sum is zero!), but for any other sum made of no a linear combination of no variables we definitely cannot make it.

Now, for i = 1 .. k, we'll try to fill in the values of T[x][i]. Remember that T[x][i] means "can we make x as a linear combination of the first i variables?" Well, we know that we can do this if there is some coefficient c such that k - c x_i can be made using a linear combination of x₁, x₂, ..., x_{i - 1}. But for any c, that's just whether T[x - c x_i][i - 1] is true. Thus we can say

for i = 1 to k
    for z = 0 to sum:
        for c = 1 to z / x_i:
            if T[z - c * x_i][i - 1] is true:
                set T[z][i] to true

Inspecting the loops, we see that the outer loop runs k times, the inner loop runs sum times per iteration, and the innermost loop runs also at most sum times per iteration. Their product is (using our notation from above) O(U² k), which is way better than the O(U^k) algorithm that you had originally.

But how do you use this information to list off all of the possible ways to sum up to the target? The trick here is to realize that you can use the table to avoid wasting a huge amount of effort searching over every possible combination when many of them aren't going to work.

Let's see an example. Suppose that we have this table completely computed and want to list off all solutions. One idea is to think about listing all solutions where the coefficient of the last variable is zero, then when the last variable is one, etc. The issue with the approach you had before is that for some coefficients there might not be any solutions at all. But with the table we have constructed above, we can prune out those branches. For example, suppose that we want to see if there are any solutions that start with x_k having coefficient 0. This means that we're asking if there are any ways to sum up a linear combination of the first k - 1 variables so that the sum of those values is sum. This is possible if and only if T[sum][k - 1] is true. If it is true, then we can recursively try assigning coefficients to the rest of the values in a way that sums up to sum. If not, then we skip this coefficient and go on to the next.

Recursively, this looks something like this:

function RecursivelyListAllThatWork(k, sum) // Using last k variables, make sum
    /* Base case: If we've assigned all the variables correctly, list this
     * solution.
     */
    if k == 0:
        print what we have so far
        return

    /* Recursive step: Try all coefficients, but only if they work. */
    for c = 0 to sum / x_k:
       if T[sum - c * x_k][k - 1] is true:
           mark the coefficient of x_k to be c
           call RecursivelyListAllThatWork(k - 1, sum - c * x_k)
           unmark the coefficient of x_k

This recursively will list all the solutions that work, using the values in the table we just constructed to skip a huge amount of wasted effort. Once you've built this table, you could divvy this work up by farming out the task to multiple computers, having them each list a subset of the total solutions, and processing them all in parallel.

Hope this helps!

Thank you so much for your answer. I will read up on your link. I have started studying algo's in an attempt to see if I can solve it in a less brute-force type of way, but I'm not hopeful since I remember reading that there is no algo for the Diophantine equation. I have been thinking maybe if I split it up on different servers and run it then maybe I can have it completed within my lifetime :-) not sure if its a realistic approach though. — Lostsoul, Sep 01 '11 at 02:45
@Lostsoul- Actually, I don't believe that this can be made scalable at all because in the worst case there might be a huge number of solutions (about U^k of them), and you need to dedicate time to generating each of them. Do you actually need all the answers, or just a count of how many there are? — templatetypedef, Sep 01 '11 at 02:49
that would be amazing. I would cry(happy tears :-) if this process could complete in a reasonable time. The problem is it generates too much data, but I need all the possibilities(I save them to a database and analyze them after). — Lostsoul, Sep 01 '11 at 02:51
I would need them all. As more variables and data, processing time increases, but can I use hadoop to split it up over different machines? — Lostsoul, Sep 01 '11 at 02:52
@Lostsoul- In general, you can't generate n items explicitly without spending at least n time to generate them. This means that you probably can't get a fast algorithm for the problem. However, you might be able to build up an implicit representation of those solutions, from which you could reconstruct all the answers you need. Do you actually have to have every single answer available? Or just the ability to list them off one at a time? — templatetypedef, Sep 01 '11 at 02:53
I think it needs to be saved. We run this today and one answer in the many results is correct. We then run it again the next day(with different variables/sums) and then analyze the results to close in the correct answer(by eliminating the false ones). Does it make a difference if its available or listed at the moment? — Lostsoul, Sep 01 '11 at 02:56
@Lostsoul- To make sure I understand this - I think I have a solution that builds up a structure from which you can list the solutions one at a time very quickly. This would mean that if you're just looking at the solutions one at a time until you find what you're looking for, then it should work. If you needed to do something weird like sorting them, then it's probably not a good solution. Would that work for you? — templatetypedef, Sep 01 '11 at 02:57
yes that would be awesome. The order doesn't matter, only that all results are captured. — Lostsoul, Sep 01 '11 at 03:02
Hey templatetypedef did you forget about me? I'm drooling waiting to see if you have something to make this run more effective. — Lostsoul, Sep 01 '11 at 04:06
@Lostsoul- Don't worry, I remember! Just had to grab dinner. I'm writing it up right now. — templatetypedef, Sep 01 '11 at 04:46
Thanks so much! I'm super interested in seeing your solution. I have a solution I use with recusion that allows me to use unlimited variables but its so slow. FYI..i setup a hadoop cluster on my home network in the past hour and I'm going to try to start learning how to scale my brute force application tomorrow. — Lostsoul, Sep 01 '11 at 04:57
@Lostsoul- Algorithm posted. Let me know if you have any questions! — templatetypedef, Sep 01 '11 at 05:11
wow awesome...Thanks so much. I will read it and let it sink in. I'll message you tomorrow if I have any questions(its super late right now..going to catch some zzzz's)...Thank you so much for your help/algo..I'll def bug you tomorrow as I play around with it..thanks and have a great night! — Lostsoul, Sep 01 '11 at 05:18
ok i couldn't sleep..I had to read this..its amazing..I sort of understand your approach(well I think I do, but won't know until I try it). I'll work on converting this to python code and let you know how it goes tomorrow. Thanks again so much templatetypedef. I'm just starting out in programming(basically just learned oop and syntax in python) so sorry but I got to ask how did you learn how to approach these kinds of algorithms? I'm reading books on algo's but i don't think most people would have had your approach to the Diophantine equation. — Lostsoul, Sep 01 '11 at 05:49
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/3049/discussion-between-templatetypedef-and-lostsoul) — templatetypedef, Sep 01 '11 at 05:51
Thanks again for this. It took me a long time but I got this working(learning alot along the way), I didn't need to use multiple systems/cores to do the work but I'm starting to get interested in doing so. Is it possible to explain how to use the RecursivelyListAllThatWork to divy the work? You mentioned in your answer each system can work on a subset but how is that subset created? is it a part of T(i.e. give it a few of T values)? or something else? — Lostsoul, Mar 19 '12 at 04:10
I have been playing around a bit and sending a smaller K to RecursivelyListAllThatWork doesn't really work very well(or limiting the k within the program) because although the first few K's can be done by multiple cores, the last K does the entire list again. I think this approach works for segments but how can I approach the entire dataset(look at all variables together)? Sorry I know its been a long time and hope you still remember the logic of this solution(everyone was very very clear except the divvying up the work part for me). — Lostsoul, Mar 19 '12 at 04:19
I'm wondering if you have had a chance to look at my questions (not sure if your getting notified without the @ ). @templatetypedef — Lostsoul, Apr 03 '12 at 13:38
@Li-aungYip This answer really sparked my intrest in learning programming. Seriously. To break it down and apply it to my problem took me months of studying I'm still not able to scale it fully but it taught me so many different things that really helped me. — Lostsoul, Apr 07 '12 at 18:14
The question did not ask for a pedantic introduction to algorithms. There is not one occurrence of "hadoop" anywhere in this essay. — T. Webster, May 26 '13 at 22:50
@T.Webster- I'm sorry this answer wasn't useful to you. When the question was originally posted, the OP did not include any information about what problem they were trying to solve, and so I started off my answer by discussing basic terminology that might be useful in understanding algorithm scalability. Once the OP updated the question to describe what they were doing, I recognized that the problem had a known solution that was relatively efficient, and therefore added that explanation to my answer. I understand your concern, though; I hope this sheds some light on why I answered as I did. — templatetypedef, May 27 '13 at 03:50

score 1 · Answer 2 · answered Sep 01 '11 at 02:37

1

By definition, brute force algorithms are stupid. You'd be much better off with a more clever algorithm (if you have one). A better algorithm will reduce the work that has do be done, hopefully to a degree that you can do it without needing to "scale out" to multiple machines.

Regardless of algorithm, there comes a point when the amount of data or computation power required is so big that you will need use something like Hadoop. But usually, we are really talking Big Data here. You can already do a lot with a single PC these days.

answered Sep 01 '11 at 02:37

Thilo

257,207
101
511
656

Some problems can only be solved by brute-force even though they scale reasonably well. Also, some brute-force algorithms actually do scale well (checking all pairs will work just fine for reasonably-sized data sets). – templatetypedef Sep 01 '11 at 02:39
Like I said "if you have (a better algorithm)". The usual examples for Hadoop (iterating over a huge data set to produce some metrics or index) strike me as rather brute-force (out of necessity of course). – Thilo Sep 01 '11 at 02:42
I updated the question with a sample of what I'm doing. but if the sum is increased to say a million and you add 10 more variables, it'll go on forever(or at least a day on my laptop). I don't think an algo exists for the problem I'm trying to solve as it simply generates too many possibilities. My problem is I need the possibilities so I thought maybe I can use hadoop to have the job run on multiple servers then save it all to hbase or something similar. I'm not sure if its possible(or i'll go broke getting severs) – Lostsoul Sep 01 '11 at 02:47
With 100 servers it will still be only 100 times as fast. So it just does not scale well. People use clusters to break encryption keys and stuff (which can also only be brute-forced), but this is all very expensive and can be made much more expensive by adding another variable in your case, or a longer key in theirs. That is what is called a hard computational problem. No way around it. – Thilo Sep 01 '11 at 02:54
this is kind of the conclusion I made, so I was trying to see first if using hadoop was possible to reduce computation time(this is my original question here), then if it is possible to reduce time then I'd study the cost versus the benefits of more variables(for example, if it takes a 24 hours to process and adding another server can make it run in 12 hours..maybe its worth the cost of another EC2 server). But if it can't be scaled(even a small amount) then I know its dead in the water and don't have to proceed. – Lostsoul Sep 01 '11 at 03:01

Yves Martin · Answer 3 · 2012-04-07T09:23:29.107

The algorithm to solve this issue is closed to the process we learn for manual mathematical division or also to convert from decimal to another base like octal or hexadecimal - except that two examples only look for a single canonical solution.

To be sure the recursion ends, it is important to order the data array. To be efficient and limit the number of recursions, it is also important to start with higher data values.

Concretely, here is a Java recursive implementation for this problem - with a copy of the result vector coeff for each recursion as expected in theory.

import java.util.Arrays;

public class Solver
{
    public static void main(String[] args)
    {
        int target_sum = 100;
        // pre-requisite: sorted values !!
        int[] data = new int[] { 5, 10, 20, 25, 40, 50 };
        // result vector, init to 0
        int[] coeff = new int[data.length];
        Arrays.fill(coeff, 0);
        partialSum(data.length - 1, target_sum, coeff, data);
    }

    private static void printResult(int[] coeff, int[] data) {
        for (int i = coeff.length - 1; i >= 0; i--) {
            if (coeff[i] > 0) {
                System.out.print(data[i] + " * " + coeff[i] + "   ");
            }
        }
        System.out.println();
    }

    private static void partialSum(int k, int sum, int[] coeff, int[] data) {
        int x_k = data[k];
        for (int c = sum / x_k; c >= 0; c--) {
            coeff[k] = c;
            if (c * x_k == sum) {
                printResult(coeff, data);
                continue;
            } else if (k > 0) {
                // contextual result in parameters, local to method scope
                int[] newcoeff = Arrays.copyOf(coeff, coeff.length);
                partialSum(k - 1, sum - c * x_k, newcoeff, data);
                // for loop on "c" goes on with previous coeff content
            }
        }
    }
}

But now that code is in a special case: the last value test for each coeff is 0, so the copy is not necessary.

As a complexity estimation, we can use the maximum depth of recursive calls as data.length * min({ data }). For sure, it will not scale well and the limited factor is the stack trace memory (-Xss JVM option). The code may fail with a stack overflow error for a large data set.

To avoid this drawbacks, the "derecursion" process is useful. It consists in replacing the method call stack by a programmatic stack to store an execution context to process later. Here is the code for that:

import java.util.Arrays;
import java.util.ArrayDeque;
import java.util.Queue;

public class NonRecursive
{
    // pre-requisite: sorted values !!
    private static final int[] data = new int[] { 5, 10, 20, 25, 40, 50 };

    // Context to store intermediate computation or a solution
    static class Context {
        int k;
        int sum;
        int[] coeff;
        Context(int k, int sum, int[] coeff) {
            this.k = k;
            this.sum = sum;
            this.coeff = coeff;
        }
    }

    private static void printResult(int[] coeff) {
        for (int i = coeff.length - 1; i >= 0; i--) {
            if (coeff[i] > 0) {
                System.out.print(data[i] + " * " + coeff[i] + "   ");
            }
        }
        System.out.println();
    }

    public static void main(String[] args)
    {
        int target_sum = 100;
        // result vector, init to 0
        int[] coeff = new int[data.length];
        Arrays.fill(coeff, 0);

        // queue with contexts to process
        Queue<Context> contexts = new ArrayDeque<Context>();
        // initial context
        contexts.add(new Context(data.length - 1, target_sum, coeff));

        while(!contexts.isEmpty()) {
            Context current = contexts.poll();
            int x_k = data[current.k];
            for (int c = current.sum / x_k; c >= 0; c--) {
                current.coeff[current.k] = c;
                int[] newcoeff = Arrays.copyOf(current.coeff, current.coeff.length);
                if (c * x_k == current.sum) {
                    printResult(newcoeff);
                    continue;
                } else if (current.k > 0) {
                    contexts.add(new Context(current.k - 1, current.sum - c * x_k, newcoeff));
                }
            }
        }
    }
}

From my point of view, it is difficult to be more efficient in a single thread execution - the stack mechanism now requires coeff array copies.

Thank you so much Yves, This is a great answer. It def handles alot more variables(I tried 20 ints in data and finished in a few min.)..really impressed, but it seems very memory hungry. It uses 100% of my cpu which is awesome..but 30 variables exceeds my 15Gigs of memory so I suspect doing a 100 will crash. Is there a way to make it not be so memory bound? — Lostsoul, Apr 07 '12 at 18:19
This seems to be a breadth-first search of the space rather than the recursive DFS of the space. I don't see this as being inherently more efficient than my approach; in both cases we explore the whole space. There is also a memory tradeoff; a BFS search will hold a lot more in memory at any one time, whereas the DFS search will only use memory proportional to the maximum depth of the tree to search. — templatetypedef, Apr 07 '12 at 19:29
@Lostsoul There is always a compromise between CPU consumption, memory usage and global response time... Does the non-recursive proposal also consumes so much memory ? — Yves Martin, Apr 07 '12 at 20:19
@YvesMartin yes, the non-recursive ones needs about 15 gigs of memory to do 30 items.. — Lostsoul, Apr 07 '12 at 23:23

Can brute force algorithms scale?

3 Answers3

Linked