5

I would like to generate a completely random "unique" (I will ensure that using my model) identifier of a given (the length may varies) length containing numbers, letter and special characters

For example:

161551960578281|2.AQAIPhEcKsDLOVJZ.3600.1310065200.0-514191032|

Can someone please suggest the most efficient way to do that in Ruby on Rails?

EDIT: IMPORTANT: If it is possible please comment on how efficient your proposed solution is because this will be used every time a user enters a website!

Thanks

glarkou
  • 7,023
  • 12
  • 68
  • 118
  • 1
    Does it need to follow a specific pattern? Or is it fine if the characters from the input set appear potentially at any place? – emboss Jul 07 '11 at 17:43
  • No patterns. I do not want it to be predictable in any way. – glarkou Jul 07 '11 at 18:01
  • 1
    Will the id be used as some kind of session id? Then besides being random it should also be cryptographically secure... – emboss Jul 07 '11 at 18:29
  • It will be used as an access token for an API. Exactly the same way that Facebook is doing! – glarkou Jul 07 '11 at 18:31

7 Answers7

12

Using this for an access token is a different story than UUIDs. You need not only pseudo-randomness but additionally this needs to be a cryptographically secure PRNG. If you don't really care what characters you use (they don't add anything to the security) you could use something as the following, producing a URL-safe Base64-encoded access token. URL-safeness becomes important in case you append the token to URLs, similar to what some Java web apps do: "http://www.bla.com/jsessionid=". If you would use raw Base64 strings for that purpose you would produce potentially invalid URLs.

require 'securerandom'

def produce_token(length=32)
  token = SecureRandom.urlsafe_base64(length)
end

The probability of getting a duplicate is equal to 2^(-length). Since the output will be Base64-encoded, the actual output will be 4/3 * length long. If installed, this is based on the native OpenSSL PRNG implementation, so it should be pretty efficient in terms of performance. Should the OpenSSL extension not be installed, /dev/urandom will be used if available and finally, if you are on a Windows machine, CryptGenRandom would be used as fallback. Each of these options should be sufficiently performant. E.g., on my laptop running produce_tokena million times finishes in ~6s.

emboss
  • 38,880
  • 7
  • 101
  • 108
  • Do you believe that this is the best solution for access_token? I was thinking that I will not use it as a key for encryption. But now I believe that it's a good strategy to use a cryptographically secure PRNG because I might use it for encryption in the future. – glarkou Jul 07 '11 at 18:58
  • 1
    The reason for using a PRNG is not because it encrypts something. You use it because you do not want anyone to be able to predict future tokens by observing a pattern. A random number generator can produce perfectly pseudo-random numbers but still be predictable, e.g. [LCG](http://en.wikipedia.org/wiki/Linear_congruential_generator). That's why you need a secure PRNG here. By using Base64 encoding instead of hex encoding you can also save some characters: hex is twice as large as "length", Base 64 only 4/3 as large. – emboss Jul 07 '11 at 19:09
  • Is that a good solution aswell? `ruby-1.9.2-p180 :006 > @foo = SecureRandom.uuid => "cb0aece9-6702-4538-ba52-52df7f24921a" ruby-1.9.2-p180 :007 > Base64.encode64(@foo) => "Y2IwYWVjZTktNjcwMi00NTM4LWJhNTItNTJkZjdmMjQ5MjFh\n" ruby-1.9.2-p180 :008 > Base64.decode64("Y2IwYWVjZTktNjcwMi00NTM4LWJhNTItNTJkZjdmMjQ5MjFh\n") => "cb0aece9-6702-4538-ba52-52df7f24921a"` – glarkou Jul 07 '11 at 19:41
  • There's no need to generate a UUID first. The mechanism of retrieving the random bytes internally stays the same, so you gain nothing by generating the UUID first and Base64-encoding it afterwards - you could as well generate Base64 directly. I forgot about URL safety, another important aspect. I modified my answer to take all this into account. – emboss Jul 07 '11 at 21:40
4

The best solution is:

require 'active_support/secure_random'
ActiveSupport::SecureRandom.hex(16) # => "00c62d9820d16b52740ca6e15d142854"

This will generate a cryptographically secure random string (i.e. completely unpredictable)

Similarly, you could use a library to generate UUIDs as suggested by others. In that case, be sure to use the random version (version 4) and make sure the implementation uses a cryptosecure random generator.

As anything related to security, rolling your own is not the best idea (even though I succumbed to it too, see first versions! :-). If you really want an homemade random string, here's a rewrite of tybro0103's approach:

require 'digest/sha1'
ALPHABET = "|,.!-0123456789".split(//) + ('a'..'z').to_a + ('A'..'Z').to_a

def random_string
    not_quite_secure = Array.new(32){ ALPHABET.sample }.join
    secure = Digest::SHA1.hexdigest(not_quite_secure)
end

random_string # => "2555265b2ff3ecb0a13d65a3d177b326733bc143"

Note that it hashes the random string, otherwise it could be subject to attack. Performance should be similar.

Marc-André Lafortune
  • 78,216
  • 16
  • 166
  • 166
  • Do you mean that this version is completely unpredictable or using the gem suggested in the previous answer?? – glarkou Jul 07 '11 at 18:21
  • This version relies on `Array#sample` which uses the (great) builtin random generator of Ruby, so it is unpredictable. Updated answer for UUID. – Marc-André Lafortune Jul 07 '11 at 18:36
  • I'll use your version and mark this as the answer. Because I cannot install the gem on heroku. – glarkou Jul 07 '11 at 18:50
  • Hashing with SHA-1 does not add security in the second example. To see this think of Password-based Encryption([PKCS#5](http://www.rsa.com/rsalabs/node.asp?id=2127)), it's the same principle here. If the underlying data is predictable (as with passwords), hashing it does not increase the entropy - that's why passwords are "salted" before they are digested. – emboss Jul 07 '11 at 19:15
  • @emboss: Not quite. The attack relies on getting enough random numbers generated to deduce the state of the mersenne twister and thus predict the future random numbers. Hashing makes it impossible to deduce bits of the state of the generator, as it can't be reversed. Lookup 'hash' on http://www.quadibloc.com/crypto/co4814.htm – Marc-André Lafortune Jul 07 '11 at 19:35
  • @Marc-André: The mersenne twister is predictable. But hashing it does not add entropy, even if the hash is a perfectly secure one-way function. Although secure PRNGs can be constructed from a hash such as SHA-1, this doesn't apply here. Take again the analogy to passwords. The domain data is limited and predictable, that's why simply digesting passwords doesn't secure them against dictionary attacks - that's why PKCS#5 recommends adding the salt. Easy illustration. Suppose you have a very bad PRNG that always generates '1'. Hashing that won't make it any less predictable. – emboss Jul 07 '11 at 22:32
  • @Marc-André: It's an interesting topic though. They mention hashing the output would improve the situation but give no hint to why this would hold. Imo the output will still be periodical, in the same way that the original is. So by observing the hashed values you just have to wait for them to become periodical and then you can predict them in the same manner as before, or am I overlooking something? – emboss Jul 07 '11 at 22:51
  • @emboss: the period of the MT is *huge* (2^19937-1, that's a ~2000 digit number!). The attack is not based on the period but on deducing the internal state of the generator from ~2 KB generated. Hashing makes it impossible to go from the result to the internal state. – Marc-André Lafortune Jul 08 '11 at 05:50
  • 1
    I was curious, so I [asked](http://stackoverflow.com/questions/6618836/how-if-at-all-does-a-predictable-random-number-generator-get-more-secure-after/6619352#6619352). Seems as if we both had a point. It seems to add to the security while still not being as secure as what e.g. SecureRandom provides. – emboss Jul 08 '11 at 06:29
  • Would adding a timestamp (with microseconds) to the `not_quite_secure` before hashing improve the security of the hash? Rainbow tables-based attack would be useless, because the attacker would not be able to predict the exact time of the hashing, I believe. (By the way, the original question had a requirement for the resulting string be of variable length, but that would require just a trivial fix.) – Arsen7 Jul 08 '11 at 08:03
3

Universally Unique Identifieres - UUIDs are tricky to generate yourself ;-) If you want something really reliable, use the uuid4r gem and call it with UUID4R::uuid(1). This will spit out a uuid based on time and a hardware id (the computers mac address). So it's even unique across multiple machines if generated at the exact same time.

A requirement for uuid4r is the ossp-uuid c library which you can install with the packetmanager of your choice (apt-get install libossp-uuid libossp-uuid-dev on debian or brew install ossp-uuid on a mac with homebrew for example) or by manually downloading and compiling it of course.

The advantage of using uuid4r over a manual (simpler?) implementation is that it is a) truly unique and not just "some sort of pseudo random number generator kind of sometimes reliable" and b) it's fast (even with higher uuid versions) by using a native extension to the c library

require 'rubygems'
require 'uuid4r'
UUID4R::uuid(1) #=> "67074ea4-a8c3-11e0-8a8c-2b12e1ad57c3"
UUID4R::uuid(1) #=> "68ad5668-a8c3-11e0-b5b7-370d85fa740d"

update: regarding speed, see my (totally not scientific!) little benchmark over 50k iterations

      user     system      total        real
version 1  0.600000   1.370000   1.970000 (  1.980516)
version 4  0.500000   1.360000   1.860000 (  1.855086)

so on my machine, generating a uuid takes ~0.4 milliseconds (keep in mind I used 50000 iterations for the whole benchmark). hope that's fast enough for you

(following the "benchmark")

require 'rubygems'
require 'uuid4r'
require 'benchmark'

n = 50000
Benchmark.bm do |bm|
  bm.report("version 1") { n.times { UUID4R::uuid(1) } }
  bm.report("version 4") { n.times { UUID4R::uuid(4) } }
end

Update on heroku: the gem is available on heroku as well

Pascal
  • 5,879
  • 2
  • 22
  • 34
  • Is that efficient?? Because I ll use it each time a user enters my website! – glarkou Jul 07 '11 at 18:08
  • This could be quite useful, but the OP has commented that it shouldn't be predictable in any way, so UUID version 4 should be used... – Marc-André Lafortune Jul 07 '11 at 18:09
  • I do not think that this is predictable right @paukul? But I am not sure if that is the most efficient way to do it! – glarkou Jul 07 '11 at 18:10
  • @ntenisOT: Version 4 is completely unpredictable; the others are not random and once you know one such uuid, the space of possibilities for the next ones is small. See wikipedia for a description of different versions of uuid: http://en.wikipedia.org/wiki/Uuid – Marc-André Lafortune Jul 07 '11 at 18:19
  • @Marc-André Lafortune: Do you mean version 4 (random number based) ?? So instead of UUID4R::uuid(1) I should use UUID4R::uuid(4) right? Thanks – glarkou Jul 07 '11 at 18:23
  • 1
    @Marc-André Lafortune is of course right of using version 4, thanks for pointing that out. I've also added the "benchmark" for that to my answer. – Pascal Jul 07 '11 at 18:27
  • btw, real life facts regarding speed. we are using the gem for an AMQP queue where every message gets a uuid generated via the gem. we are publishing a couple of thousands of messages per second and the uuid4r gem is definitely not the bottleneck here :-) – Pascal Jul 07 '11 at 18:29
  • Thanks mate. Can you answer another question for me please? Will the gem work on heroku? – glarkou Jul 07 '11 at 18:32
  • http://gems-summary.heroku.com/2011-05-15 so I guess the answer is: yes :) you don't even have to install ossp-uuid, just add the gem to your gemfile – Pascal Jul 07 '11 at 18:36
2
def random_string(length=32)
    chars = (0..9).to_a.concat(('a'..'z').to_a).concat(('A'..'Z').to_a).concat(['|',',','.','!','-'])
    str = ""; length.times {str += chars.sample.to_s}
    str
end

The Result:

>> random_string(42)
=> "a!,FEv,g3HptLCImw0oHnHNNj1drzMFM,1tptMS|rO"
tybro0103
  • 48,327
  • 33
  • 144
  • 170
  • Perhaps in the final implementation you would set chars to an array that's a long list of all those characters rather than generating each time. – tybro0103 Jul 07 '11 at 17:56
  • Yes, I think that's really bad. For example what if you have 1000 users online and you want one of this unique identifiers for each of them.. ? I think that's not efficient. – glarkou Jul 07 '11 at 18:00
  • So just use a long array [0,1,2,3,4,...]. – tybro0103 Jul 07 '11 at 18:24
  • Idea is right, the implementation can be improved (see my answer). Note that `str += ...` is slower than `str << ...`, as it creates an intermediate string each time. – Marc-André Lafortune Jul 07 '11 at 18:32
0

It is a bit trickier to generate random letters in Ruby 1.9 vs 1.8 due to the change in behavior of characters. The easiest way to do this in 1.9 is to generate an array of the characters you want to use, then randomly grab characters out of that array. See http://snippets.dzone.com/posts/show/491

Cory Gagliardi
  • 770
  • 1
  • 8
  • 13
0

You can check implementations here I used this one

Bohdan
  • 8,298
  • 6
  • 41
  • 51
0

I used current time in miliseconds to generate random but uniqure itentifier.

Time.now.to_f # => 1656041985.488494

Time.now.to_f.to_s.gsub('.', '') # => "16560419854884948"

this will give 17 digits number sometime it can give 16 digits number because if last digit after point (.) is 0 than it is ignore by to_f. so, I used rleft(17, '0')

example:

Time.now.to_f.to_s.gsub('.', '').ljust(17, '0') # => "1656041985488490"

Than I used to_s(36) to convert it into short length alphanumeric string.

Time.now.to_f.to_s.gsub('.', '').ljust(17, '0').to_i.to_s(36) # => "4j26hz9640k"

to_s(36) is radix base (36)

https://apidock.com/ruby/v2_5_5/Integer/to_s

if you want to limit the length than you can select first few digits of time in miliseconds:

Time.now.to_f.to_s.gsub('.', '').ljust(17, '0').first(12).to_i.to_s(36) # => "242sii2l"

but if you want the uniqueness accuracy in miliseconds than I would suggest to have atleast first(15) digits of time

Touqeer
  • 583
  • 1
  • 4
  • 19