37

I'm trying to generate a random string using a range of acceptable characters. I have a working implementation, which is included below, but I wanted to know if the logic of converting the random byte to a printable character exposes me to any risk or inadvertently exposes other internal states. I kept the number of available characters as a number evenly divisible by 256, to help prevent an uneven bias in the generated string.

using System.Security.Cryptography;
class Example {
  static readonly char[] AvailableCharacters = {
    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 
    'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 
    'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 
    'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'
  };

  internal static string GenerateIdentifier(int length) {
    char[] identifier = new char[length];
    byte[] randomData = new byte[length];

    using (RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider()) {
      rng.GetBytes(randomData);
    }

    for (int idx = 0; idx < identifier.Length; idx++) {
      int pos = randomData[idx] % AvailableCharacters.Length;
      identifier[idx] = AvailableCharacters[pos];
    }

    return new string(identifier);
  }
}

Running the above sample code 10 times with a length of 40 gives me the following output:

hGuFJjrr6xuuRDaOzJjaltL-ig09NNzbbvm2CyZG
BLcMF-xcKjmFr5fO-yryx8ZUSSRyXcTQcYRp4m1N
ARfPJhjENPxxAxlRaMBK-UFWllx_R4nT0glvQLXS
7r7lUVcCkxG4ddThONWkTJq0IOlHzzkqHeMi4ykU
TMwTRFORVYCLYc8iWFUbfZWG1Uk2IN35IKvGR0zX
hXNADtfnX4sjKdCgmvZUqdaXSFEr_c_mNB3HUcax
-3nvJyou8Lc-a0limUUZYRScENOoCoN9qxHMUs9Y
bQPmVvsEjx0nVyG0nArey931Duu7Pau923lZUnLp
b8DUUu6Rl0VwbH8jVTqkCifRJHCP3o5oie8rFG5J
HuxF8wcvHLpiGXedw8Jum4iacrvbgEWmypV6VTh-

The question I guess I'm asking, is this code relatively safe for use or is this a really, really bad idea? The end users never see this identifier and the lifetime is very short lived.

Additional information

In an attempt to describe more about the use of the identifier, it's intended use is to be used as a key for a short-lived request, used to pass information from one application to another, third-party system. Since the data has to go through the (untrusted) user's browser, we are storing the actual report information in a database and generating this identifier for the target application to be able to pick up and remove that information from the database.

Since the target information is in a third-party system outside of our control (development wise, still on-premises) and we can't directly authenticate our users against the third-party system, this token is intended to allow the user to be identified and for the report to be run with the information stored in the database. The report itself has to be public facing (on the internet) without authentication (because the majority of our users don't have account in the third-party system) and because the report deals with HIPAA/FERPA data we wanted to ensure as best we can that even with the identifier in the attackers control that they can't generate a valid request.

Community
  • 1
  • 1
Joshua
  • 8,112
  • 3
  • 35
  • 40
  • 2
    This question doesn't have enough information to answer. You're describing a security system without describing the attack it protects against! Don't ask us to criticize your locks without telling us what's behind the door and who is trying to break it down. – Eric Lippert Oct 10 '13 at 14:42
  • 1
    @ts guids guarantee uniqueness, not randomness. Version four guids need not be crypto strength random. – Eric Lippert Oct 10 '13 at 14:47
  • What do you suspect is a risk here? What you're asking is slightly unclear. –  Oct 10 '13 at 14:47
  • Why not just use a cert owned by the third party and encrypt a GUID? – zimdanen Oct 10 '13 at 14:58
  • @zimdanen that would be awesome if we could, but the third-party system we're integrating with is a simple reporting system that takes prompts (via GET or POST) and starts processing the query. While reports can be authenticated, a majority of our users in this case don't have an account. – Joshua Oct 10 '13 at 15:05
  • Sounds like hashing GUID will be sufficient – T.S. Oct 10 '13 at 15:10
  • 3
    @TS: again: guids are not guaranteed to be *unpredictable*, they are only guaranteed to be *unique*. Joshua needs a strong guarantee of unpredictability. – Eric Lippert Oct 10 '13 at 15:21
  • 1
    Your char-set is awfully close to [Base64](http://msdn.microsoft.com/en-us/library/dhx0d524.aspx) (it uses `+` and `/` where you use `-` and `_`), would just Base64 encoding the byte array generated work for you too? – Scott Chamberlain Oct 10 '13 at 15:22
  • 1
    If you don't have a character set that evenly divides 256, you can take a look at http://stackoverflow.com/questions/54991/generating-random-passwords/19068116#19068116 , it assumes that the bias from taking `2^64 mod chars.Length` leads to negligible bias. – CodesInChaos Oct 10 '13 at 15:30

1 Answers1

31

The additional information is helpful. I presume that you never send the token in the clear and never send it to an untrusted party.

To answer the question that was actually asked: yes, your code correctly generates a 40 character random string containing 240 bits of randomness. I note that of course you consume 320 bits of randomness to do so, but, whatever, bits are cheap.

Presumably the number of tokens thus generated is a very small fraction of 2240, and therefore it will be hard for an attacker to guess at a valid token. If tokens have a short lifespan -- if they are only in the database while the transaction is happening, and then go away a short time later -- that's even better. Defense in depth.

Note that a software RNG takes information from its environment as the seed entropy. If malware can be running on the machine doing the generation then it could be attempting to manipulate that environment, and thereby deduce part of the entropy. But if you have malware running on that machine, odds are good that you already have far larger problems.

I note also that the garbage collector does not make any guarantees about how long those strings and arrays containing the token hang around in memory. Again, if you have malware with admin privileges on your machine that starts up a debugger and interrogates memory, it can discover the keys. Of course that presumes that the bad actors are already on the wrong side of the airtight hatchway, as Raymond Chen says. Memory scanning by malware with admin privileges is the least of your worries!

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • I just wanted to point out that your two assumptions are correct: the user is authenticated and the identifier is sent over a secure channel. I would also like to thank you for your help in fleshing out the full question. It's easy to forget important information that seems obvious to the developer! – Joshua Oct 12 '13 at 01:23
  • Hello. I am new to Randomness and RNGCryptoServiceProvider, so forgive me please if this is a stupid question. Why does this code generates a string containing 240 bits of randomess? Thank you – Radu Caprescu Nov 01 '16 at 09:04
  • 2
    @RaduCaprescu: There are 64 characters in the alphabet, so if we choose an element at random from the alphabet, we must have 6 bits of randomness because 2 to the 6 is 64. The string is 40 characters long. 6 times 40 is 240. – Eric Lippert Nov 01 '16 at 13:54