4

Ora_hash is deterministic in that the keeping the input and seed constant will produce a definite output.

But, for any seed (including the default), will the output of an ora_hash necessarily have the property that:

"No method exists that can be used to predict what the result of an ORA_HASH will be with any level of success higher than purely random guessing?"

Jeremy
  • 5,365
  • 14
  • 51
  • 80
  • Your first statement is not completely correct. At least for LOBs, it is not deterministic. See my answer [here](http://stackoverflow.com/q/9450381/409172) for an example. – Jon Heller Sep 19 '13 at 19:29

1 Answers1

7

Probably not, no. ORA_HASH is not designed as a particularly robust hashing algorithm-- it's certainly not something you would use as part of a cryptography library, for example, you'd use the DBMS_CRYPTO.HASH choosing an appropriate algorithm and options. ORA_HASH is designed to distribute items into buckets effectively (i.e. what happens when you hash partition a table) and to be a relatively lightweight hashing algorithm. It is not designed to be secure from attack.

Justin Cave
  • 227,342
  • 24
  • 367
  • 384
  • 1
    The use case I am concerned with is sampling for data analysis. Am I potentially introducing bias by selecting a random subset of data in this way? – Jeremy Sep 19 '13 at 19:41
  • Yes, it's a hash function designed to fill buckets approximately evenly, but it's very likely that all the items in one bucket will have some common arbitrary property--for instance, if it's a number, the numbers might all be "X modulo N = the same value", or if it's a string, items with similar-looking leading characters might bucket together. – StilesCrisis Sep 19 '13 at 19:48
  • @Jeremy - If you want a random sample, I'd use one of the `dbms_crypto.random*` methods or, if you're slightly less paranoid, `dbms_random.value`. – Justin Cave Sep 19 '13 at 19:57
  • @Jeremy - another way to get a random sample from a table is using the `SAMPLE` keyword. – Jeffrey Kemp Sep 20 '13 at 00:53
  • 1
    @JeffreyKemp `SAMPLE` [may not be accurate enough in some cases](http://stackoverflow.com/a/16029271/409172). It's good enough for a quick sample, and good enough for the optimizer, but probably not good enough for serious statistics work. – Jon Heller Sep 20 '13 at 16:10