1

Offtopic:

I'm new to stack overflow, and I wanted to say hello!

On topic:

I'm generating a version 5 UUID for an application that needs randomized folder creation and deletion via a timestamp time() through

my $md5_UUID  = create_uuid_as_string(UUID_MD5, time."$job");

These folders are generated per run on each job, and are deleted after running. If the same UUID is somehow generated, the +-1000 jobs that are running could halt.

Is there any information that I can pull from this or any possibility of collisions (different data generating the same UUID)? Are they truly unique? Also, which version of UUID should I use between SHA1 and MD5?

Brainless Box
  • 587
  • 9
  • 16
  • The whole point of a uuid is that the possibility of any collision is so small as to be not worth worrying about. Do you really need uuid's anyway? Do you have multiple machines accessing a shared filesystem or some such? – Richard Huxton Jun 20 '13 at 20:19
  • Yes, the server will be accessed through multiple machines -- it is both the main storage location of job data and the machine the jobs execute on. Do you have any suggestions on a different random folder generation method beyond a UUID generated by time stamp appended with job name? – Brainless Box Jun 20 '13 at 20:30
  • 3
    Version 5 [UUIDs](http://en.wikipedia.org/wiki/Universally_unique_identifier) are hashed with SHA-1, not MD5; if you are using an MD5 hash, you are not creating Version 5 UUIDs. That said, even MD5 has a ridiculously low probability of collision unless it's being deliberately attacked, so unless your server's going to be running a couple of million (MD5, 2^20.96) or over a quintillion (SHA1, 2^60) jobs simultaneously, you're wasting time worrying about collisions which you could better spend ensuring your application can scale to a point where you *do* need to worry about them. – Aaron Miller Jun 20 '13 at 21:04

3 Answers3

4

Use OS Tools

There's probably a pure Perl solution, but it may be overkill. If you are on a Linux system, you can capture the results of mktemp or uuidgen and use them in your Perl script. For example:

$ perl -e 'print `mktemp --directory`'
/tmp/tmp.vx4Fo1Ifh0

$ perl -e '$folder = `uuidgen`; print $folder'
113754e1-fae4-4685-851d-fb346365c9f0

The mktemp utility is nice because it will atomically create the directory for you, in addition to returning the directory name. You also have the ability to give more meaningful names to the directory by modifying the template (see man 1 mktemp); in contrast, UUIDs are not really good at conveying useful semantics.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
0

If the folders last only the length of a job, and all the jobs are running on the same machine, you can just use the pid as a folder name. No need for uuids at all.

Richard Huxton
  • 21,516
  • 3
  • 39
  • 51
0

Use a v1 UUID

Perl's time() function is accurate to the second. So, if you're starting your jobs multiple times per second, or simultaneously on separate hosts, you could easily get collisions.

By contrast, a v1 UUID's time field is granular to nanoseconds, and it includes the MAC address of the generating host. See RFC 4122 for details. I can imagine a case where that wouldn't guarantee uniqueness (the client machines are VMs on separate layer-3 virtual networks, all with the same virtual MAC address), but that seems pathologically contrived.

david
  • 997
  • 6
  • 15