What algorithms have high time complexity, to help "burn" more CPU cycles?

Question

I am trying to write a demo for an embedded processor, which is a multicore architecture and is very fast in floating point calculations. The problem is that the current hardware I have is the processor connected through an evaluation board where the DRAM to chip rate is somewhat limited, and the board to PC rate is very slow and inefficient.

Thus, when demonstrating big matrix multiplication, I can do, say, 128x128 matrices in a couple of milliseconds, but the I/O takes (lots of) seconds kills the demo.

So, I am looking for some kind of a calculation with higher complexity than n^3, the more the better (but preferably easy to program and to explain/understand) to make the computation part more dominant in the time budget, where the dataset is preferably bound to about 16KB per thread (core).

Any suggestion?

PS: I think it is very similar to this question in its essence.

This question seems too vague to provide a concrete "best" answer. Could you just use some sort of brute-force algorithm for solving an NP-hard problem, like TSP or subset sum? — templatetypedef, Jan 23 '12 at 22:12
How about simulated annealing? That'll burn about as much CPU as you let it.. — Mike Christensen, Jan 23 '12 at 22:16

score 3 · Answer 1 · answered Jan 23 '12 at 22:28

3

You could generate large (256-bit) numbers and factor them; that's commonly used in "stress-test" tools. If you specifically want to exercise floating point computation, you can build a basic n-body simulator with a Runge-Kutta integrator and run that.

answered Jan 23 '12 at 22:28

Jarred

391
1
3

1

Thanks. N-Bodies simulation was one of the first ideas we discussed, and the advantage of this (and basically any physical time-simulation) is that you can run it for as long as you want. But one drawback we see is that the amount of data transfer between cores (the states of the bodies) is at the order of magnitude of the calculations (as opposed to matmul, for example). I do not reject the idea, though. – ysap Jan 23 '12 at 22:52
You could always just run several in parallel instead of parallelizing a single n-body... – Jarred Jan 23 '12 at 23:07

score 1 · Answer 2 · answered Jan 23 '12 at 22:19

1

What you can do is

Declare a std::vector of int
populate it with N-1 to 0
Now keep using std::next_permutation repeatedly until they are sorted again i..e..next_permutation returns false.

With N integers this will need O(N !) calculations and also deterministic

answered Jan 23 '12 at 22:19

parapura rajkumar

24,045
1
55
85

1

This does not advertise the fact that the processor is very fast on floating point computation. – hugomg Jan 23 '12 at 22:38
1

I think that the poster wants to exercise the FPU, is there a floating-point version of this? – RBarryYoung Jan 23 '12 at 22:39

score 1 · Answer 3 · answered Jan 23 '12 at 22:55

PageRank may be a good fit. Articulated as a linear algebra problem, one repeatedly squares a certain floating-point matrix of controllable size until convergence. In the graphical metaphor, one "ripples" change coming into each node onto the other edges. Both treatments can be made parallel.

score 1 · Answer 4 · answered Jan 24 '12 at 12:49

You could do a least trimmed squares fit. One use of this is to identify outliers in a data set. For example you could generate samples from some smooth function (a polynomial say) and add (large) noise to some of the samples, and then the problem is to find a subset H of the samples of a given size that minimises the sum of the squares of the residuals (for the polynomial fitted to the samples in H). Since there are a large number of such subsets, you have a lot of fits to do! There are approximate algorithms for this, for example here.

RBarryYoung · Answer 5 · 2012-01-23T23:10:43.997

Well one way to go would be to implement brute-force solver for the Traveling Salesman problem in some M-space (with M > 1).

The brute-force solution is to just try every possible permutation and then calculate the total distance for each permutation, without any optimizations (including no dynamic programming tricks like memoization).

For N points, there are (N!) permutations (with a redundancy factor of at least (N-1), but remember, no optimizations). Each pair of points requires (M) subtractions, (M) multiplications and one square root operation to determine their pythagorean distance apart. Each permutation has (N-1) pairs of points to calculate and add to the total distance.

So order of computation is O(M((N+1)!)), whereas storage space is only O(N).

Also, this should not be either too hard, nor too intensive to parallelize across the cores, though it does take some overhead. (I can demonstrate, if needed).

score 0 · Answer 6 · answered Jan 24 '12 at 14:15

Another idea might be to compute a fractal map. Basically, choose a grid of whatever dimensionality you want. Then, for each grid point, do the fractal iteration to get the value. Some points might require only a few iterations; I believe some will iterate forever (chaos; of course, this can't really happen when you have a finite number of floating-point numbers, but still). The ones that don't stop you'll have to "cut off" after a certain number of iterations... just make this preposterously high, and you should be able to demonstrate a high-quality fractal map.

Another benefit of this is that grid cells are processed completely independently, so you will never need to do communication (not even at boundaries, as in stencil computations, and definitely not O(pairwise) as in direct N-body simulations). You can usefully use O(gridcells) number of processors to parallelize this, although in practice you can probably get better utilization by using gridcells/factor processors and dynamically scheduling grid points to processors on an as-ready basis. The computation is basically all floating-point math.

Mandelbrot/Julia and Lyupanov come to mind as potential candidates, but any should do.

Thanks - fractals was one of the early ideas as well, but then it was overruled by the fact that you need to load and display the result image which would be painfully slow with the USB connection at hand. — ysap, Jan 24 '12 at 22:28

What algorithms have high time complexity, to help "burn" more CPU cycles?

6 Answers6

Linked