1

I am generating a unique and random alphanumeric string segment to represent certain links that will be generated by the users. For doing that I was approaching with "uuid" number to ensure it's uniqueness and randomness, but, as per my requirements the string shouldn't be more than 5 characters long. So I dropped that idea.

Then I decided to generate such a string using random function of ruby and current time stamp. The code for my random string goes like this:-

 temp=DateTime.now
 temp=temp + rand(DateTime.now.to_i)
 temp= hash.abs.to_s(36)   

What I did is that I stored the current DateTime in a temp variable and then I generated a random number passing the current datetime as parameter. Then in the second line actually added current datetime and random number together to make a unique and random string.

Soon I found,while I was testing my application in two different machines and send the request at the same time, it generated the same string(Though it's rare) once after more than 100 trials.

Now I'm thinking that I should add one more parameter like mac address or client ip address before passing to_s(36) on temp variable. But can't figure out how to do it and even then whether it will be unique or nor...

Thanks....

Siddharth
  • 833
  • 6
  • 12
  • 25
  • 1
    5 chars (40 bits) is a pretty small amount of data. It's ok for small sequential counters, but for "random unique" numbers - not so much. – Sergio Tulentsev Jan 28 '13 at 05:46
  • 1
    In rails, you can get client ip by calling `request.remote_ip`. How to weave it in your hashing logic is up to you. – Sergio Tulentsev Jan 28 '13 at 05:50
  • @SergioTulentsev I have to make a string looks like bitly...bcoz it will represent a link which on request will redirect to one or many long urls... – Siddharth Jan 28 '13 at 06:01
  • bitly links are not random, AFAIK. They're simple base-62 numbers. – Sergio Tulentsev Jan 28 '13 at 06:04
  • @SergioTulentsev But what if two same strings are generated for two different links...how the mapping will be done?? – Siddharth Jan 28 '13 at 06:13
  • 2
    That's the problem with 5 chars. Use more chars. – Sergio Tulentsev Jan 28 '13 at 06:14
  • The question on generating such keys has been answered before, http://stackoverflow.com/questions/11153244/rails-generate-unique-random-number-for-populating-a-css-class-string-how-to and http://stackoverflow.com/questions/10836458/how-to-safely-implement-access-with-a-key-for-a-web-application for example ... The matter of ensuring uniqueness has also been answered before. – Jonas Schubert Erlandsson Jan 28 '13 at 08:40

3 Answers3

5

If you are certain that you will never need more than a given M amount of unique values, and you don't need more than rudimentary protection against guessing the next generated id, you can use a Linear Congruentual Generator to generate your identificators. All you have to do is remember the last id generated, and use that to generate a new one using the following formula:

newid = (A * oldid + B) mod M

If 2³² distinct id values are enough to suit your needs, try:

def generate_id
  if @lcg
    @lcg = (1664525 * @lcg + 1013904223) % (2**32)
  else
    @lcg = rand(2**32) # Random seed
  end
end

Now just pick a suitable set of characters to represent the id in as little as 6 character. Uppercase and lowercase letters should do the trick, since (26+26)^6 > 2^32:

ENCODE_CHARS = [*?a..?z, *?A..?Z]

def encode(n)
  6.times.map { |i|
    n, mod = n.divmod(ENCODE_CHARS.size)
    ENCODE_CHARS[mod]
  }.join
end

Example:

> 10.times { n = generate_id ; puts "%10d = %s" % [n, encode(n)] }

2574974483 = dyhjOg
3636751446 = QxyuDj
 368621501 = bBGvYa
1689949688 = yuTgxe
1457610999 = NqzsRd
3936504298 = MPpusk
 133820481 = PQLpsa
2956135596 = yvXpOh
3269402651 = VFUhFi
 724653758 = knLfVb

Due to the nature of the LCG, the generated id will not repeat until all 2³² values have been used exactly once each.

Lars Haugseth
  • 14,721
  • 2
  • 45
  • 49
  • Hi Lars, I'm not a Ruby expert and I want to convert your encode function to PHP: could you please tell me exactly what happens inside encode? –  Mar 01 '14 at 09:25
  • PHP: https://gist.github.com/AndreBaumeier/c5959491626dc6d79a63 full example in PHP: https://gist.github.com/AndreBaumeier/3cd529765061f14902a4 – Andre Baumeier Nov 04 '14 at 14:51
4

SecureRandom in ruby uses process id (if available) and current time. You can use the urlsafe_base64(n= 16) class method to generate the sequence you need. According to your requirements I think this is your best bet.

Edit: After a bit of testing, I still think that this approach will generate non-unique keys. The way I solved this problem for barcode generation was:

barcode= barcode_sql_id_hash("#{sql_id}#{keyword}")

Here, your keyword can be time + pid.

nurettin
  • 11,090
  • 5
  • 65
  • 85
  • 3
    Or just `SecureRandom.hex(5)` :) – Sergio Tulentsev Jan 28 '13 at 06:05
  • @nurettin I'm using "SecureRandom.urlsafe_base64(4, true)"..which is looking good but the only problem is many times it's inserting "-" character in the string...can I generate it without "-"?? – Siddharth Jan 28 '13 at 07:12
  • @Siddharth '-' is part of the URL-Safe base64 charset. I suggested it because you wanted to use it in a link, so it is ok. Since SecureRandom uses OpenSSL random cryptography functions, it generates ids that try to be unique per computer. However, you might still have hash collisions, so I suggest you change your approach to using "the ids of the records involved" for seeding your hash. – nurettin Jan 28 '13 at 07:25
2

There is no way you can generate a unique UUID with only five chars, with chars and numbers you have a basic space of around 56 chars, so there is a max of 56^5 combinations , aprox 551 million (Around 2^29).

If with this scheme you were about to generate 10.000 UUIDs (A very low number of UUIDs) you would have a probability of 1/5.000 of generating a collision. When using crypto, the standard definition of a big enough space to avert collisions is around 2^80.

To put this into perspective, your algorithm would be better off if it generated just a random integer (a 32 bit uint is 2^32, 8 times the size you are proposing) which is clearly a bad idea.

Batou99
  • 869
  • 10
  • 19
  • 1
    The problem with uniqueness is trivially solved by testing against the DB or trying an insert and rescuing with a new token generation if the insert fails. True that the space is small, but so is the cost of a collision, until ha starts to fill the entire space... – Jonas Schubert Erlandsson Jan 28 '13 at 08:43