19

I'm trying to use "RayW hand evaluator" approach to get a card combination score (5 best cards out of 7). However I'm having some performance issues with this method. According to sources - using this approach it must be possible to evaluate more than 300 mil hands per second! My result is 10 mills in 1.5 seconds, which is many times slower.

The idea behind "RayW hand evaluator" is following:

The Two Plus Two evaluator consists of a large lookup table containing some thirty-two million entries (32,487,834 to be precise). In order to lookup a given 7-card poker hand, you trace a path through this table, performing one lookup per card. When you get to the last card, the value so obtained is the official equivalence value of the hand

here is how code looks like:

namespace eval
{
public struct TPTEvaluator
{
    public static int[] _lut;

    public static unsafe void Init() // to load a table
    {
        _lut = new int[32487834];
        FileInfo lutFileInfo = new FileInfo("HandRanks.dat");
        if (!lutFileInfo.Exists)
        {throw new Exception("Handranks.dat not found");}

        FileStream lutFile = new FileStream("HandRanks.dat", FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 4096);

        byte[] tempBuffer = new byte[32487834 * 4];
        lutFile.Read(tempBuffer, 0, 32487834 * 4);

        fixed (int* pLut = _lut)
        { Marshal.Copy(tempBuffer, 0, (IntPtr)pLut, 32487834 * 4);}
        tempBuffer = null;
    }

    public unsafe static int LookupHand(int[] cards) // to get a hand strength
    {
        fixed (int* pLut = _lut)
        {
            int p = pLut[53 + cards[0]];
            p = pLut[p + cards[1]];
            p = pLut[p + cards[2]];
            p = pLut[p + cards[3]];
            p = pLut[p + cards[4]];
            p = pLut[p + cards[5]];
            return pLut[p + cards[6]];
        }
    }
}

}

and that's how I test this approach:

    private void button4_Click(object sender, EventArgs e)
    {
        int[] str = new int[] { 52, 34, 25, 18, 1, 37, 22 };

        int r1 = 0;

        DateTime now = DateTime.Now;
        for (int i = 0; i < 10000000; i++) // 10 mil iterations 1.5 - 2 sec
        { r1 = TPTEvaluator.LookupHand(str);} // here
        TimeSpan s1 = DateTime.Now - now;
        textBox14.Text = "" + s1.TotalMilliseconds;
    }

I believe that this method was originally implemented in C++, but nevertheless C# port should work faster. Is there any way how I can get close to at least 100 millions of hands in one sec?

What I tried so far:

  • tried using Static, and non-static methods - no difference.
  • tried using dictionary lookup instead of array

    public void ArrToDict(int[] arr, Dictionary<int, int> dic)
    {
        for (int i = 0; i < arr.Length; i++)
        {
            dic.Add(i, arr[i]);
        }
    }
    
    public unsafe static int LookupHandDict(int[] cards)
    {
        int p = dict[53 + cards[0]];
        p = dict[p + cards[1]];
        p = dict[p + cards[2]];
        p = dict[p + cards[3]];
        p = dict[p + cards[4]];
        p = dict[p + cards[5]];
        return dict[p + cards[6]];
    }
    

Elapsed time for 10 mills of hands is almost 6 times slower..

  • According to one person - he increased the performance by 200 mills by removing "unsafe" code. I tried to do the same thing but results are almost the same.

    public static int LookupHand(int[] cards)
    {
            int p = _lut[53 + cards[0]];
            p = _lut[p + cards[1]];
            p = _lut[p + cards[2]];
            p = _lut[p + cards[3]];
            p = _lut[p + cards[4]];
            p = _lut[p + cards[5]];
            return _lut[p + cards[6]];
    }
    

Here is the quote:

After removing the "unsafe" code parts and some small adjustments in the c# version it is now also around 310 mio.

is there any other way to increase the performance of this hand ranking system?

Alex
  • 4,607
  • 9
  • 61
  • 99
  • 10
    Have you tried running in release mode? It might perform faster then, because the code is optimized. – Michal B. May 07 '12 at 14:42
  • the .dat file which contains all these entries is 130 Mb in size.. Nevertheless P4 PCs with 1-2 gigs of ram didn't have any problems. I have 4 gigs, so RAM shouldn't be much of a problem. – Alex May 07 '12 at 14:44
  • @MichalB. it's 200 milliseconds now!! thanks a bunch. I'll not be removing this issue.. maybe there are some more improvements that can be made.. Thank you, again!! – Alex May 07 '12 at 14:46
  • Why `LookupHand` has to be unsafe? Drop the _fixed_ and use `_lut` as a regular integer array. – Nicolas Repiquet May 07 '12 at 14:54
  • that's exactly what i did.. no difference. but let me check on that again in "released" exe.. EDIT: almost no difference. Maybe a little faster, but not by a large margin. *also edited the original post. – Alex May 07 '12 at 14:56
  • "return a value(score) for each card out of 7. The Sum of these values is your hand score." That's definitely NOT what is going on here. There's no summing of scores here whatsoever, it's a multi-dimensional array lookup with compression techniques that would make Escher proud. – Ben Voigt May 07 '12 at 15:35
  • 1
    By the way, just to let you know, a more accurate way to get elapsed time is to use the [StopWatch Class](http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx) rather then just using DateTime objects – Icemanind May 07 '12 at 15:53
  • have you tried the parallel.for of c# 4.0? It is an easy translation of the code and can increase the performance a lot! – Pablo Castilla May 07 '12 at 16:23
  • 10
    The algorithm has very poor locality of reference, execution time is going to be dominated by the speed of the memory bus since the array is too large to fit in the cache. Which is why you don't see a difference between unsafe and safe code even through the safe code adds a bounds check. Using threads isn't going to help, you still have only one bus. Notable is that your test code doesn't exercise that problem since you evaluate the same hand over and over again. There's no simple recipe to make it faster. – Hans Passant May 07 '12 at 16:51
  • Is a debugger attached? by default, running in the debugger disables JIT optimizations. You can either run without debugger, or change that option somewhere in VS . – CodesInChaos May 07 '12 at 20:42
  • 1
    @HansPassant: +1 to your comment but OP's test code is not the typical use of such code (nor the typical *"full enum"* poker hand evaluation code). In typical code intermediate lookup indices are saved and the innermost loop only lookups into one of 52 contiguous spot. My seven for loops are not "practical" code (practical code is a bit more complex) but intermediate indices are saved and the last loop doesn't introduce cache miss all the time (it's still bad locality of reference, but not anywhere near as bad as it sounds from OP's main loop). – TacticalCoder May 08 '12 at 23:29
  • Can you give a link to the data file? – nakiya May 13 '12 at 08:12
  • I didn't find the C# code to generate this data file.. but the pregenerated file itself can be easily found using google search (i.e. "handRanks.dat download" <- look for the first link).. just make sure you have 128 megabytes of disk space – Alex May 13 '12 at 11:53
  • DAT file and quoted portions of this question are at http://www.codingthewheel.com/archives/poker-hand-evaluator-roundup#2p2 along with a handful of other evaluators. –  Aug 20 '12 at 06:18

2 Answers2

4

First - benchmarking is always tricky. Things that perform one way on your machine don't always perform the same way on other machines and there is a lot going on 'under-the-covers' that can invalidate data (like caching done by the OS or even hardware).

Having said that - I took a look at just your Init() method and it left me scratching my head. I found it difficult to follow. My rule of thumb for using 'unsafe' is to not use it, unless I absolutely have to. This Init() method, I'm assuming, gets called once, right? I decided to benchmark it:

static void BenchmarkIt(string input, Action myFunc)
{
    myWatch.Restart();
    myFunc();
    myWatch.Stop();

    Console.WriteLine(input, myWatch.ElapsedMilliseconds);
}

BenchmarkIt("Updated Init() Method:  {0}", Init2);
BenchmarkIt("Original Init() Method: {0}", Init1);  

Where Init1() is your original code and Init2() is my rewritten code (I've also flipped the order several times in the sake of fairness). Here's what I get (on my machine)...

Updated Init() Method: 110

Original Init() Method: 159

Here's the code I used. No unsafe keyword required.

public static void Init2()
{
    if (!File.Exists(fileName)) { throw new Exception("Handranks.dat not found"); }            

    BinaryReader reader = new BinaryReader(File.Open(fileName, FileMode.Open));            

    try
    {
        _lut = new int[maxSize];
        var tempBuffer = reader.ReadBytes(maxSize * 4); 
        Buffer.BlockCopy(tempBuffer, 0, _lut, 0, maxSize * 4);
    }
    finally
    {
        reader.Close();
    }
}

In my opinion, this code is easier to read and it seems to run faster.

I know you are probably more concerned about LookupHand()'s performance, but I wasn't able to make any significant improvements. I tried a few different approaches but nothing that helped.

I was able to run your code 100,000,000 times in 500 milliseconds. I'm running on a fairly beefy 64-bit laptop - which seems to be the speed you were expecting. Like others have said - running in release mode (enabling optimization) can have a big impact on performance.

Community
  • 1
  • 1
Rob P.
  • 14,921
  • 14
  • 73
  • 109
  • @Alex - I'm really embarrassed....I didn't post the code I had intended. Please try the updated code now! The benchmark values are still what I'm seeing on my machine. – Rob P. May 11 '12 at 20:30
4

If you want generic speed, I would suggest using the evaluator at Brecware: https://web.archive.org/web/20160502170946/http://brecware.com/Software/software.html. Steve Brecher's evaluator is faster than the RayW evaluator for evaluations which occur in random order, and is much more compact.

As noted in the comments, the RayW evaluator depends on locality of reference for it's speed. If you're not traversing the evaluations in the exact same order as the lookup tables, it's going to be slow. If that is your problem there are three approaches:

  1. Make your evaluation order more closely match the tables.
  2. Make tables that match your evaluation order
  3. Make an evaluator optimized to your use case.
Andrew Prock
  • 6,900
  • 6
  • 40
  • 60