0

I need to create a document GUID for an app that will insert Xmp data into a file. Written in C++, the app will be cross platform so I want to avoid using libraries that might create compatibility problems.

What I'm thinking of doing is to create a string by concatenating the name of my app plus the number of seconds since epoch plus the appropriate number of characters from a SHA256 hash calculated for the full file (I will be doing that anyway for another reason).

Would the result produced be enough to guarantee that collision would be "sufficiently improbable in practice"?

mharran
  • 149
  • 3
  • 15
  • 1
    how about boost's header only, cross-platform library uuid? http://www.boost.org/doc/libs/1_58_0/libs/uuid/uuid.html – m.s. May 31 '15 at 14:26
  • http://stackoverflow.com/questions/543306/platform-independent-guid-generation-in-c – ZivS May 31 '15 at 14:28
  • How about using Xorshift algorithms ? http://en.wikipedia.org/wiki/Xorshift – a_pradhan May 31 '15 at 14:46
  • Even if you implement your own makeshift algorithm, consider using a proper OS implementation where available. – MSalters May 31 '15 at 16:56
  • @MSalters - why do you recommend a proper OS implementation? I'm looking to avoid having to develop platform specific solutions and/or a series of #ifdef statements. I think my own solution also futureproofs the application. – mharran May 31 '15 at 21:28
  • @mharran: Well, for instance because they have the version number set correctly. And by using the vetted algorithms, they're less likely to collide. E.g. your idea of using the app name causes a large set of bits to have low entropy. – MSalters May 31 '15 at 21:39

1 Answers1

2

Unless you are expecting to generate insanely high numbers of documents, using SHA256 all by itself is overwhelmingly likely to avoid any collisions. If your app generates fewer than 10^29 documents over its lifetime then the chance of any collisions is less than 1 in 10^18 (assuming SHA256 is well-designed).

This means that roughly everyone in the world (7e9) could use your app to generate 1e9 documents per second for 1,000 years before there was a 0.0000000000000001% chance of a collision.

You can add more bits (like name and time) if you like but that will almost certainly be overkill for the purpose of avoiding collisions.

rhashimoto
  • 15,650
  • 2
  • 52
  • 80
  • The reason I'm adding date and time is that two different people could work with copies of the same original file which would have same hashcount; it is unlikely they would be editing those copies within 1 second of each other. Adding in the name of my app is probably superfluous. – mharran May 31 '15 at 21:26
  • Why not just add the ID of the user? – rhashimoto May 31 '15 at 21:38