4

I faced a following problem: generate N unique alphanumeric strings from a restricted alphabet. Here's my solution in C#:

string Alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
Random generator = new Random();
const int ToGenerate = 10000;
const int CharactersCount = 4;
ArrayList generatedStrings = new ArrayList();
while( generatedStrings.Count < ToGenerate ) {
   string newString = "Prefix";
   for( int i = 0; i < CharactersCount; i++ ) {
      int index = generator.Next( Alphabet.Length );
      char character = Alphabet[index];
      newString += character;
   }
   if( !generatedStrings.Contains( newString ) ) {
      generatedStrings.Add( newString );
   }                
}
for( int i = 0; i < generatedStrings.Count; i++ ) {
    System.Console.Out.WriteLine( generatedStrings[i] );
}

it generates 10K strings starting with "Prefix" and otherwise consisting of capital letters and numbers. The output looks good.

Now I see the following problem. The produced strings are for a scenario where they should be unlikely to be predicted by anyone. In my program the seed is time-dependent. Once someone knows the seed value he can run the same code and get the exact same strings. If he knows any two strings he can easily figure out my algorithm (since it is really naive) and attempt to brute-force the seed value - just enumerate all possible seed values until he sees the two known strings in the output.

Is there some simple change that could be done to my code to make the described attack less possible?

sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • 2
    http://msdn.microsoft.com/en-us/library/system.guid.aspx – Jaroslav Jandek Jul 22 '10 at 09:07
  • 2
    @Jaroslav: The GUID algorithm is specifially deterministic. It's not what he wants. It's unique, not random. – Noon Silk Jul 22 '10 at 09:10
  • @silky: Good luck guessing the exact tick seed used to generate the GUID, and the next one, ... However you generate the GUID in **your code** you won't be able to prevent guessing. Only if you use a third party service that can't be reflected or it's hard to guess its algorithm you can be somewhat safe. He wanted a simple change, GUID is way less guessable and simpler to use. – Jaroslav Jandek Jul 22 '10 at 09:33
  • @Jaroslav "Good Luck"? Are you for real? It's a shame programmers have such a cavalier attitude towards security. Personally, I don't understand it, but I won't waste time arguing with you about it. – Noon Silk Jul 22 '10 at 09:44
  • @silky: Correction: He **cannot** guess the list of strings exactly. It's not like it uses one seed value. It uses 10k seed values (therefore you cannot calculate one value from the other). It is not effective but it is simple (eg. exactly what the OP asked). For example using a `RandomNumberGenerator` class to geenrate the unique values is close to equal security-wise but way more effective (also less simple). And quite frankly, he uses a 4-char long strings, way easier to simply brute-force them than to try to get any seed value (35^4 comparisons). – Jaroslav Jandek Jul 22 '10 at 10:06
  • @Jaroslav Jandek: The whole point why I asked this question is is quite easy to brute force the seed by passing different values into `Random` constructor. Having two sample strings one can quickly check if he's picked up the right value. – sharptooth Jul 22 '10 at 10:07
  • @sharptooth: you can't do that with `GUID` as I have repeatedly stated - it does NOT use a single seed value for 10000 strings, it generates 10000 seeds (again, not very effective, but it works). So for your case, it is safe. – Jaroslav Jandek Jul 22 '10 at 10:10
  • 5
    Using a Guid to solve this realistic security problem is an extremely bad idea. Guids are not designed to solve this problem. **Do not use a tool that was not designed for solving security problems to solves security problems, particularly when there already is an excellent tool easily available.** If you have a security problem **use a tool specifically designed to solve that problem.** The framework ships with a crypto-strength RNG specifically designed and implemented by world-class security experts to solve this problem; use it. – Eric Lippert Jul 22 '10 at 14:29
  • @Eric Lippert: I was answering the question: how to get **unique random strings** that can't be trivially predicted. And he wanted it simple. `GUID` does just that - a simple and readable 1-liner. I agree that RNG is the best option, since it is designed for generating random numbers (but not globally unique). I guess the **random** requirement takes precedence in this case. I was just answering to people that were stating it would not work... Well, it would. – Jaroslav Jandek Jul 23 '10 at 05:29

2 Answers2

8

Well, how would he know the seed? Unless he knew the exact time you ran the code, that is very hard to do. But if you need stronger, you can also create cryptographically strong random numbers via System.Security.Cryptography.RandomNumberGenerator.Create - something like:

        var rng = System.Security.Cryptography.RandomNumberGenerator.Create();
        byte[] buffer = new byte[4];
        char[] chars = new char[CharactersCount];
        for(int i = 0 ; i < chars.Length ; i++)
        {
            rng.GetBytes(buffer);
            int nxt = BitConverter.ToInt32(buffer, 0);
            int index = nxt % Alphabet.Length;
            if(index < 0) index += Alphabet.Length;
            chars[i] = Alphabet[index];
        }
        string s = new string(chars);
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • If you have a series of numbers all generated at a very close time, it might be possible to statistically determine the seed by the properties of the numbers. I think it's a wise concern. – Noon Silk Jul 22 '10 at 09:12
  • I suppose he can brute-force the seed passing different values into `Random` constuctor, can't he? – sharptooth Jul 22 '10 at 09:16
  • @sharptooth - but not with `RandomNumberGenerator` - see the demo code added – Marc Gravell Jul 22 '10 at 09:18
  • 1
    @sharptooth: Yes. You are right. Given that he could guess the year to within, say, +- 10 years, it leaves only 315,360,000 numbers to try. Which really isn't that many, if you know what you're looking for, as you suggest. – Noon Silk Jul 22 '10 at 09:19
  • Btw why BitConverter.ToUInt32() is not used? – sharptooth Jul 22 '10 at 09:57
  • @sharptooth - I guess that would be fine too. – Marc Gravell Jul 22 '10 at 09:59
  • 1
    Marc: Indeed, it is not *at all* hard to work backwards from even a small sample of stuff generated by a pseudo-RNG to determine what the seed was and hence what all the other output was. The number of seeds is *tiny* - typically only a few billion, or even a few million. – Eric Lippert Jul 22 '10 at 14:15
5

Well, it depends what you consider "simple".

You can "solve" your problem by using a "true" source of random numbers. You can try the free ones (random.org, fourmilab hotbits, etc), or buy one, depending on the sort of operation you're running.

Alternatively (and perhaps better) is to not generate in advance, and instead generate on demand. But this may be a significant change to your business process/model.

Noon Silk
  • 54,084
  • 6
  • 88
  • 105
  • Yeah, random.org and other similar services are a good option since it is much more truly random, random.org uses atmospheric noise, I believe, to get their results, so you could just use the API to this to preload a bunch of randomized data for use, combining this with basic programming "random" algorithms would, I think, yield very good results. – Rick Jul 22 '10 at 10:05