I'm working on a tokenization system to store sensitive data in a key value table,
but the default select order can be a weak link of the chain if it reproduce the order of the INSERT queries. I think that it much easier to understand with an example.
Suppose that we have this data:
FirstName: Bob
LastName: Smith
Yearly income: $56000
We associate for every value a random token:
58217 => Bob
79174 => Smith
37142 => $56000
We have two tables on two different databases:
the first one has the columns "FirstName", "LastName" and "YearlyIncome" and we store on it a row with values 58217, 79174 and 37142;
the second one has the columns "Token" and "PlainText" where we store three records:
58217, "Bob"
79174, "Smith"
37142, "$56000"
The purpose of tokenization is that if an intruder can read just one database, he cannot pull out any useful information.
BUT: if a simple SELECT * FROM Tokens give the records in the same order we inserted them, it's really simple to read that Bob Smith has an yearly income of $56000. Just read from top to bottom.
I know that SQL doesn't guarantee that the default select order reflects the insert order, but really often it does, and in this context it's a problem, so the question is:
there is a way to shuffle the default order or - better - to shuffle how the data is physically stored on the database?
I know that we can dump all data, shuffle it and then re-insert it with the new order but I think it's a too much expensive way to do it; and I'm sure that there is already a smooth solution but I didn't found find anything useful on Internet, so here I am :)
Thank you very much!