4

I need some advice on how to tackle an algorithmic problem (ie. not programming per se). What follows are my needs and how I tried to meet them. Any comments for improvement would be welcome.

Let me first start off by explaining my goal. I would like to play some poker about a billion times. Maybe I'm trying to create the next PokerStars.net, maybe I'm just crazy.

I would like to create a program that can produce better randomized decks of cards, than say the typical program calling random(). These need to be production quality decks created from high quality random numbers. I've heard that commercial-grade poker servers use 64-bit vectors for every card, thus ensuring randomness for all the millions of poker games played daily.

I'd like to keep whatever I write simple. To that end, the program should only need one input to achieve the stated goal. I have decided that whenever the program begins, it will record the current time and use that as the starting point. I realize that this approach would not be feasible for commercial environments, but as long as it can hold up for a few billion games, better than simpler alternatives, I'll be happy.

I began to write pseudo-code to solve this problem, but ran into a thorny issue. It's clear to me, but it might not be to you, so please let me know.

Psuedo-code below:

    Start by noting the system time.
    Hash the current time (with MD5) around ten times (I chose the ten arbitrarily).
    Take the resulting hash, and use it as the seed to the language-dependent random() function.
    Call random() 52 times and store the results.
    Take the values produced by random() and hash them.
    Any hash function that produces at least 64-bits of output will do for this.
    Truncate (if the hash is too big) so the hashes will fit inside a 64-bit double.
    Find a way to map the 52 doubles (which should be random now, according to my calculations) into 52 different cards, so we can play some poker.

My issue is with the last step. I cannot think of a way to properly map each 64-bit value to a corresponding card, without having to worry about two numbers being the same (unlikely) or losing any randomness (likely).

My first idea was to break 0x0000000000000000 - 0xFFFFFFFFFFFFFFFF into four even sections (to represent the suits). But there is no guarantee that we will find exactly thirteen cards per section, which would be bad.

Now that you know where I am stuck, how would you overcome this challenge?

-- Edited --

Reading bytes from /dev/random would work well actually. But that still leaves me lost on how to do the conversion? (assuming I read enough bytes for 52 cards).

My real desire is to take something simple and predictable, like the system time, and transform it into a randomized deck of cards. Seeding random() with the system time is a BAD way of going about doing this. Hence the hashing of the time and hashing the values that come out of random().

Hell, if I wanted to, I could hash the bytes from /dev/random, just for shizzles and giggles. Hashing improves the randomness of things, doesn't it? Isn't that why modern password managers store passwords that have been hashed thousands of times?

-- Edit 2 --

So I've read your answers and I find myself confused by the conclusion many of you are implying. I hinted at it in my first edit, but it's really throwing me for a loop. I'd just like to point it out and move on.

Rainbow tables exist which do funky math and clever magic to essentially act as a lookup table for common hashes that map to a particular password. It is my understanding that longer, better passwords are unlikely to show up in these rainbow tables. But the fact still stands that despite how common many user passwords are, the hashed passwords remain safe after being hashed thousands of times.

So is that a case where many deterministic operations have increased the randomness of the original password (or seems to?) I'm not saying I'm right, I'm just saying thats my feeling.

The second thing I want to point out is I'm doing this backwards.

What I mean is that you all are suggesting I take a sorted, predictable, non-random deck of cards and use the Fisher-Yates shuffle on it. I'm sure Fisher-Yates is a fine algorithm, but lets say you couldn't use it for whatever reason.

Could you take a random stream of bytes, say in the neighborhood of 416 bytes (52 cards with 8 bytes per card) and BAM produce an already random deck of cards? The bytes were random, so it shouldn't be too hard to do this.

Most people would start with a deck of 52 cards (random or not) and swap them around a bunch of times (by picking a random index to swap). If you can do that, then you can take 52 random numbers, run through them once, and produce the randomized deck.

As simply as I can describe it, The algorithm to accepts a stream of randomized bytes and looks at each 8-byte chunk. It maps each chunk to a card.

Ex. 0x123 maps to the Ace of Spades Ex. 0x456 maps to the King of Diamonds Ex. 0x789 maps to the 3 of Clubs .... and so on.

As long as we chose a good model for the mapping, this is fine. No shuffling required. The program will be reduced to two steps.

Step 1: Obtain a sufficient quantity of random bytes from a good source Step 2: Split this stream of bytes into 52 chunks, one for each card in the deck Step 2a: Run through the 52 chunks, converting them into card values according to our map.

Does that makes sense?

Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
Tommy Fisk
  • 339
  • 4
  • 16
  • You really do seem to be overcomplicating this. There's no difference between "shuffling an ordered deck" and "creating a shuffled deck" - the outcome, if you implement the randomization properly, is the same. Also, no amount of mathematical gymnastics will turn something predictable like system time into an unpredictable shuffle: your attacker can simply exhaustively try all likely system times, performing the same gymnastics for each, and see which one matches. – Nick Johnson Mar 18 '11 at 19:27
  • That's very true. I appreciate all the feedback. If you're curious, I've gone the simple route via the answer I chose. – Tommy Fisk Mar 20 '11 at 17:05

8 Answers8

15

You are massively overcomplicating the problem. You need two components to solve your problem:

  1. A shuffling algorithm
  2. A sufficiently high-quality random number generator for the shuffling algorithm to use.

The first is easy, just use the Fisher-Yates shuffle algorithm.

For the second, if you want sufficient degrees of freedom to be able to generate every possible permutation (of the 52! possibilities) then you need at least 226 bits of entropy. Using the system clock won't give you more than 32 or 64 bits of entropy (in practice far fewer as most of the bits are predictable), regardless of how many redundant hashes you perform. Find an RNG that uses a 256-bit seed and seed it with 256 random bits (a bootstrapping problem, but you can use /dev/random or a hardware RNG device for this).

Dan Dyer
  • 53,737
  • 19
  • 129
  • 165
  • Can you link more information for understanding degrees of freedom and how it relates to this? I'll preemptively assume you linked the wikipedia page for degrees of freedom :P – Tommy Fisk Mar 17 '11 at 21:23
  • @Tommy: Done. If you have an PRNG with a 64-bit seed, it has 2^64 possible starting states, each of which hopefully generates a different shuffle. But if there are ~2^226 (or 52!) possible shuffles then the vast majority of permutations cannot be achieved as there are no seeds that can lead to those outcomes. This is mostly a theoretical distinction as in practice 2^64 is usually sufficient. – Dan Dyer Mar 17 '11 at 22:19
  • @Tommy @Dan: Both this post and that link (which was apparently also written by you) are incorrect. The size of the seed gives the number of possible starting states of the PRNG, *not* the number of possible states. For example, a simple [MWC](http://en.wikipedia.org/wiki/Multiply-with-carry) can have a 32-bit seed but have a period of over 2^2000000. – BlueRaja - Danny Pflughoeft Mar 17 '11 at 22:37
  • 1
    @BlueRaja But the number of possible starting states constrains the number of possible outcomes when you shuffle the deck the first time, starting from your initial state. The point I'm trying to make is that the seed size restricts the number of independent bits that you can generate. – Dan Dyer Mar 18 '11 at 00:17
  • @Dan: Yes, that is true, but that only relates to the security of the PRNG, *not* the perceived randomness of the output. The following statements you made: `But if there are ~2^226 (or 52!) possible shuffles then the vast majority of permutations cannot be achieved as there are no seeds that can lead to those outcomes` - `The problem is that the RNG is almost certainly not able to generate every possible outcome. There are many orderings of the deck that will never occur.` are completely false - a given 32-bit seed could certainly generate every possible deck. – BlueRaja - Danny Pflughoeft Mar 18 '11 at 15:14
  • He *would*, of course, want to use a PRNG with a large seed, for security reasons, but that has absolutely nothing to do with the number of possible deck permutations. – BlueRaja - Danny Pflughoeft Mar 18 '11 at 15:19
  • @BlueRaja I'll concede that the last sentence that you quote (from the linked article) is misleading/wrong (it shouldn't say "never occur" as that implies the full period of the RNG, which is not what I was getting at). – Dan Dyer Mar 18 '11 at 15:32
  • @BLueRaja Only if he used the same PRNG to generate all the shuffles sequentially. In practice, he would create a new PRNG with a new, randomly-selected seed, for each deck. – Nick Johnson Mar 18 '11 at 19:26
  • 1
    @Nick: In practice you would __*never*__ do that - in fact, you *couldn't* do it without a reliable hardware random-number generator. However, because the PRNG's seed usually has fewer bits than its internal state, we normally just xor the truly random output with the PRNG's output, rather than seed a whole new PRNG. Even more importantly, again in practice, there is really never a need for such expensive hardware, since a cryptographically-secure PRNG provides more than enough security. – BlueRaja - Danny Pflughoeft Mar 18 '11 at 19:33
  • @BlueRaja If you're going to do this in the sort of application the OP proposes, you _should_ have a decent hardware PRNG. Any random number generator is constrained by the amount of entropy you put into it, and using a single PRNG for all your shuffling is going to make your attacker's life a lot easier when it comes to figuring out the internal state. – Nick Johnson Mar 18 '11 at 21:01
  • 1
    @Nick The problem with hardware RNGs is that they tend to be slow, expensive (relatively speaking, compared to a PRNG) and less reliable than PRNGs. The ideal solution is probably some kind of hybrid such as Fortuna where the hardware provides a continuous source of entropy for a PRNG. – Dan Dyer Mar 18 '11 at 21:47
  • @Dan: Thanks, I had not heard of that one before. Seems like almost every good crypto algorithm was developed either by Schneier, Ferguson, or Rivest :) – BlueRaja - Danny Pflughoeft Mar 18 '11 at 21:51
  • @Dan That's fine, but it's still not a good idea to rely on a single PRNG for all your random data absent any extra entropy input - it's far too easy to compromise. That's what I was specifically addressing. – Nick Johnson Mar 18 '11 at 22:43
  • Fisher-Yates requires a means of generating unbiased numbers in a non-power-of-two range. Since 52! is not a power of two, no approach can yield a perfectly unbiased shuffle given a finite quantity of binary input, but I suspect approaches other than Fisher Yates might require a smaller average number of bits. Of course, in practice it probably doesn't matter how many random bits are needed to perform a shuffle, and Fisher Yates is probably the easiest approach if one takes care to ensure unbiased selection of numbers, but the conversion from power-of-two to 0-to-N is important. – supercat Mar 23 '11 at 16:20
6

You don't mention which OS you're on, but most modern OS's have pre-made sources of high quality entropy. On Linux, it's /dev/random and /dev/urandom, from which you can read as many random bytes as you want.

Writing your own random number generator is highly non-trivial, if you want good randomness. Any homebrew solution is likely to be flawed and could potentially be broken and its outputs predicted.

Marc B
  • 356,200
  • 43
  • 426
  • 500
  • 1
    I'd be careful with using `/dev/random` too much because you will block for a long time if you consume the stored entropy. – templatetypedef Mar 17 '11 at 21:05
  • The equivalent in Windows is [`CryptGenRandom`](http://en.wikipedia.org/wiki/CryptGenRandom) – BlueRaja - Danny Pflughoeft Mar 17 '11 at 21:06
  • How much stored entrophy is in /dev/random? :P – Tommy Fisk Mar 17 '11 at 21:19
  • @Tommy: it is typically collected from a stream while the computer is running. For example, it could be period (perhaps to the microsecond) between a user's keystrokes. If the user isn't typing anything, you'll run out of entropy. – Mark Peters Mar 17 '11 at 21:20
  • 1
    Linux and BSD use multiple sources for entropy data. Keystroke timing, network packet arrival times, etc... It's not based on one source, but many different sources meshed together. Using a single source (other than hardware generators) would be very predictable or at least easily influenced to become non-random. – Marc B Mar 17 '11 at 21:24
5

You will never improve your randomness if you still use a pseudo-random generator, no matter how many deterministic manipulations you do to it. In fact, you are probably making it considerably worse.

I would use a commercial random number generator. Most use hardware solutions, like a Geiger counter. Some use existing user input as a source of entropy, such as background noise into the computer's microphone or latency between keyboard strokes.

Edit:

You mentioned that you also want to know how to map this back to a shuffle algorithm. That part is actually quite simple. One straightforward way is Fisher-Yates shuffle. Basically all you need from your RNG is a random number uniformly distributed between 0 and 51 inclusive. That you can do computationally given any RNG and is usually built into a good library. See the "Potential sources of bias" section of the Wikipedia article.

Mark Peters
  • 80,126
  • 17
  • 159
  • 190
  • 1
    I don't think there's a call for that - any cryptographically-secure PRNG should be more than enough... – BlueRaja - Danny Pflughoeft Mar 17 '11 at 21:04
  • You can start small, just use the random number generator supplied by your language's, framework's library or operating system. Some chipsets nowadays also have an integrated PRNG that might be supported by your system. Later, when your system is really that popular, you can go get a really good RNG of your choice. Oh, and a Geiger counter may be expensive and have low throughput ? – Arc Mar 17 '11 at 21:07
  • @Blue: My point is, it's that source of randomness that is key. There is no point in doing all of these determistic manipulations. – Mark Peters Mar 17 '11 at 21:10
2

Great question!

I would strongly discourage you from using the random function that comes built-in with any programming language. This generates pseudorandom numbers that are not cryptographically secure, and so it would be possible for a clever attacker to look at the sequence of numbers coming back out as cards and to reverse-engineer the random number seed. From this, they could easily start predicting the cards that would come out of the deck. Some early poker sites, I've heard, had this vulnerability.

For your application, you will need cryptographically secure random numbers so that an adversary could not predict the sequence of cards without breaking something cryptographically assumed to be secure. For this, you could either use a hardware source of randomness or a cryptographically secure pseudorandom number generator. Hardware random generators can be expensive, so a cryptographically secure PRNG may be a good option.

The good news is that it's very easy to get a cryptographically secure PRNG. If you take any secure block cipher (say, AES or 3DES) and using a random key start encrypting the numbers 0, 1, 2, ..., etc. then the resulting sequence is cryptographically secure. That is, you could use /dev/random to get some random bytes for use as a key, then get random numbers by encrypting the integers in sequence using a strong cipher with the given key. This is secure until you hand back roughly √n numbers, where n is the size of the key space. For a cipher like AES-256, this is 2128 values before you'd need to reset the random key. If you "only" want to play billions of games (240), this should be more than fine.

Hope this helps! And best of luck with the project!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • There's a lot more to lose though if the key is found, so protect that key! – Mark Peters Mar 17 '11 at 21:08
  • @Mark Peters- Definitely! Though pretty much any good pseudorandom approach will have this problem, I think. – templatetypedef Mar 17 '11 at 21:08
  • Very interesting! You def know your stuff. Can you take a look at my latest edit and help me understand more of these concepts? – Tommy Fisk Mar 17 '11 at 21:53
  • @Tommy Fisk- There's a difference between using an off-the-shelf **hash function** and an off-the-shelf **encryption algorithm**. You can attack hash functions with precomputed tables because the same hash applied to the same input always produces the same output. You cannot, however, attack an encryption algorithm using a precomputed table because you don't know what the key is. For algorithms like AES-256 that have 256-bit keys, there are more possible keys than atoms in the universe, so you don't need to worry about someone trying to decrypt everything. – templatetypedef Mar 17 '11 at 21:57
  • @Tommy Fisk- As for not using Fisher-Yates and instead just coming up with cards out of the bytestream, this is certainly theoretically possible, but I'd advise against it. Fisher-Yates is known to produce perfectly random distributions, and anything you'd come up with would at best be doing what Fisher-Yates does. Moreover, Fisher-Yates isn't a hard algorithm - you can write it in about five lines of code - and I'd avoid reinventing the wheel unless you have to. – templatetypedef Mar 17 '11 at 21:59
  • I'll either do that, or continue with the bytestream method as a proof of concept. Thanks for your input, but sometimes the wheel needs to be reinvented :P – Tommy Fisk Mar 17 '11 at 22:12
1

You should definitely read the answer to this question: Understanding "randomness"

Your approach of applying a number of arbitrary transformations to an existing pseudorandom number is very unlikely to improve your results, and in fact risks rendering less random numbers.

You might consider using physically derived random numbers rather than pseudorandom numbers: http://en.wikipedia.org/wiki/Hardware_random_number_generator

If you are definitely going to use pseudorandom numbers, then you are likely to be best off seeding with your operating system's randomness device, which is likely to include additional entropy from things like disk seek times as well as user IO.

Community
  • 1
  • 1
kdt
  • 27,905
  • 33
  • 92
  • 139
0

In terms of actually turning the random numbers into cards(once you follow the advice of others in generating the random numbers), You can map the lowest number to the Ace of diamonds, the 2nd lowest number to the 2 of diamonds, etc.

Basically you assume the actual cards have a natural ordering and then you sort the random numbers and map to the deck.

Edit

Apparently wikipedia lists this method as an alternative to the Fisher-Yates algorithm(which I hadn't previously heard of -Thanks Dan Dyer!). One thing in the wikipedia article that I didn't think of is that you need to be sure that you don't repeat any random numbers if you're using the algorithm I described.

gcooney
  • 1,689
  • 10
  • 14
0

Reading bytes from /dev/random would work well actually. But that still leaves me lost on how to do the conversion? (assuming I read enough bytes for 52 cards).

Conversion of what? Just take a deck of cards and, using your cryptographically-secure PRNG, shuffle it. This will produce every possible deck of cards with equal probability, with no way for anyone to determine what cards are coming next - that's the best you could possibly do.

Just make sure you implement the shuffling algorithm correctly :)

BlueRaja - Danny Pflughoeft
  • 84,206
  • 33
  • 197
  • 283
  • Shuffling a non-random deck to make it mostly random is not what I wrote :P I want to take randomness (from /dev/random or wherever) and convert it into a randomized deck. Backwards, I know. – Tommy Fisk Mar 17 '11 at 21:39
  • @Tommy: that *is* how you make a randomized deck. You take *any* deck, randomized or not (even sorted), and shuffle it. The algorithm uses the source of randomness in choosing random indices. In pseudocode, it would look something like this: `add all cards to deck; then, for i from 1 to 52 { SwapCards(i, randomNumberFrom(i, 52)) };` This will give you a (completely, not just *"mostly"*) randomized deck. – BlueRaja - Danny Pflughoeft Mar 17 '11 at 21:53
  • Thats how most people would do it. But if you can add 52 cards to a deck and swap them around a bunch of times (by picking very random indicies to swap), you can also take 52 random numbers, say; 0x123, that's the ace of spades... 0x456 thats the king of diamonds, and so on. You'd run through the cards once and you'd have a randomized deck without swaps. You could do this over and over, as long as you had enough random bytes to consume. – Tommy Fisk Mar 17 '11 at 21:55
  • @Tommy: But then you have to deal with the issue of, what happens if the PRNG produces 0x123 twice? You clearly don't want to give out two Ace-of-spaces. Do you ignore it and try again (in which case, it could be a *very* long time before you generate 52 unique numbers)? Or do you generate a number between [1,52] the first time, then [1,51], then [1,50], etc? In the second case, you are doing exactly the same thing as the Fisher–Yates shuffle algorithm, only artificially making it more complex for yourself, for no reason. – BlueRaja - Danny Pflughoeft Mar 17 '11 at 22:15
  • @Tommy: However, if you are worried about generating too many random numbers, what you *could* do is [map every possible permutation of cards to a number](http://en.wikipedia.org/wiki/Permutation#Numbering_permutations) and generate a single random-number in that range. There are 52! possible decks, so you would need a random number of at least 226 bits (~29 bytes). However, this is a preoptimization - I **highly** doubt the bottleneck of a poker-site would be the CPU utilization of shuffling a deck of cards, so you are better off doing what's simplest and using your time on other things. – BlueRaja - Danny Pflughoeft Mar 17 '11 at 22:21
0

A ready-made, off the shelf poker hand evaluator can be found here. All feedback welcomed at the e-mail address found therein.

Ken
  • 30,811
  • 34
  • 116
  • 155