1

I have a scenario where I have to obfuscate data(=scramble, for testing purposes, so it is not possible to see the real data, there is no need on unscramble/unobfuscate it) in database. There are several tables that are referencing the address_table. I can not obfuscate the address_table, so I figured that I simply change the references in those tables with random other address_table ID-s. The address_table contains 6M+ records. So I would create a temp table with all the address ID-s and then, when needed call some function to get a random one from there. So I could possibly generate a random value and take that row like:

Select * From (
Select Id, Rownum Rn From myTempTable )
WHERE  RN = x;

where x is some random value generated by dbms_random. Now, although this is what I need, it does not perform anything near to what I expect. Other thing I have tried is to call the sample() function, this (at least on small table) performs I bit better, but it is not good enough.

I know there are several threads on this matter like this or this on mySql, but they do not directly answer it in terms of performance.

Also, I am not limited in using pl/sql. I know a very little of pl/sql, how is it in terms of performance? I mean, it is just another process in DB server processing queue, perhaps i could get better performance doing the processing (i mean generating the update scripts, populating randoms etcetc) on client side using something like python, even considering network latency etc? Does anybody have any experience on this?

Community
  • 1
  • 1
Erki M.
  • 5,022
  • 1
  • 48
  • 74
  • Out of interest, how are you planning to un-obfuscate the data (assuming that's a requirement)? – NPE Nov 13 '13 at 09:14
  • What do you mean by "obfuscate" and how is selecting random records a solution to this problem? – Rene Nov 13 '13 at 09:19
  • I am not, it is more like scrambling of data. I have to create a test database for testing so it is not possible to see the real data. – Erki M. Nov 13 '13 at 09:20
  • see http://stackoverflow.com/questions/7750396/oracle-random-row-from-table/7750895#7750895 – Kevin Burton Nov 13 '13 at 12:45
  • possible duplicate of [how to make selecting random rows in oracle faster with table with millions of rows](http://stackoverflow.com/questions/3150550/how-to-make-selecting-random-rows-in-oracle-faster-with-table-with-millions-of-r) – Jon Heller Feb 28 '14 at 00:23

3 Answers3

2

Use sample clause

select * from myTempTable SAMPLE(10);

This will return only 10% of rows.

ibre5041
  • 4,903
  • 1
  • 20
  • 35
0

If you just want to hide the real data why don't you take care of that in the select part of the query. Insteady of querying:

select column_name from table;

you could select

select scrambling_function(column_name) from table;

scrambling_function can be whatever you like.

Rene
  • 10,391
  • 5
  • 33
  • 46
0

There is not a good way to sample randomly using SQL that I am aware of. The sample function available in some SQL versions is not a sufficient random sample. The best way is to export the full sample set and use random software to determine the index of rows to be included in your final solution. Or if you have a simple number index (1,2,3...n) and know how many rows you need to select from you could upload a list of index's to include and query against that. Try random.org for random number generation, their API is located at http://www.random.org/clients/http/.

DrBailey
  • 183
  • 1
  • 9