4

I have a randomized recursive backtracking algorithm to generate Sudoku puzzles (see here). It works fast enough on average, but the worst-case runtime is unacceptably slow. Here is a histogram of the runtime in milliseconds for 100 trials (the "More" is about 200,000 ms!):

enter image description here

I would like to improve the algorithm by simply timing it out after t ms, and restarting with a new random seed. To prevent this from repeating infinitely, I would either stop after n tries or increase t after each failed try. If t is much larger than the median, there's a good chance of getting a much faster run on a subsequent try.

Questions:

  1. How can I adjust the timeout period t for different processors? Is there a fast, reliable way to benchmark processor performance prior to each run? Alternatively, should I adapt to the processor over several runs, for instance using the mean runtime of all previous runs? I am running this on Android, if that's relevant.
  2. Is there a better strategy entirely to avoid the long tail in the distribution of the runtimes?
Community
  • 1
  • 1
1''
  • 26,823
  • 32
  • 143
  • 200

2 Answers2

2

Since your algorithm is recurisve, why not establish a maximum recursion depth? If the particular random seed leads to a recursion depth that you have empirically established to be high enough that you will hit the long tail, abort at that point.

By visual approximation, it looks like after 4500ms you will not get significant returns on your investment for the given seed. Rerun that benchmark also tracking recursion depth, and see what that number is. I would run more than 100 samples, though.

That solution is CPU speed independent.

Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • For my backtracking algorithm, it's not a question of recursion depth (always 81, the number of cells in a puzzle) but rather recursion *breadth*: how much of the search tree of possible solutions has to be covered before a decision is made. I've linked to a description of my algorithm in the top post. – 1'' Jan 10 '13 at 21:52
  • Surely you can track the current breadth (or similar step-counting metric) and apply similar logic? – Eric J. Jan 10 '13 at 21:59
  • That's an interesting idea, but wouldn't it be hard to do efficiently? I might be going through hundreds of thousands of levels, and incrementing a counter at each step could be quite slow. – 1'' Jan 10 '13 at 22:12
  • Try benchmarking incrementing a counter 100,000 times. It will be a *very* small value compared to even your best runtimes of 500ms. – Eric J. Jan 10 '13 at 22:33
1
  1. Yes there is, it is called confidence interval. By running the algorithm several time in pre-processing (or on the fly), you can determine with x% confidence (where x is a parameter) what is the interval where the median of running time lays in.
    You can reduce the interval size by decreasing x or increasing the number of times the algorithm runs.

    Of course if you cannot really run the algorithm itself, you can try to benchmark it on some machine and find the confidence interval (let it be I), and create some function f(I,s) that given a timing of a different algorithm (the timing of it is s) on a different machine (M), predicts what should be the interval for the machine M.
    finding s is done in a similar manner - using confidence interval.

  2. Your approach seems fine, I'd probably do the same - I will first set up a small factor, and increase it after each failing attempt. Note that this is somehow similar to the congestion control in the TCP protocol (from the field of networks) to find the accepted rate of transfer of packages over the net.

amit
  • 175,853
  • 27
  • 231
  • 333
  • Thanks for the link to TCP! I could use the middle of the confidence interval, or just take the sample mean or (probably less accurate) median, but I'm mainly concerned about a reliable way to benchmark the processor. – 1'' Jan 10 '13 at 21:55
  • @1'' Using some other algorithm (with "deterministic" run time) and reusing the confidence interval approach you can compare between different processors and estimate what will be the median of your algorithm. – amit Jan 10 '13 at 21:57