I have a certain SQL table which needs to be anonymized before displaying output to a partner. So I use pd.read_sql()
to connect to the table and convert it to a DataFrame, however, how do I change client_name
(in place) into a combination of 5 randomly generated letters? Thank you
Asked
Active
Viewed 306 times
0

user8834780
- 1,620
- 3
- 21
- 48
-
Does it have to be 5 random letters? Have you considering using a hash function instead (plus salt)? That way you can ensure that the `client_name` will always hash to the same seemingly random value. – pault Jan 29 '18 at 16:37
-
If you don't need to link back to the original name (if you hashed them), why not just set all the names to `'anon'` – EdChum Jan 29 '18 at 16:38
-
I basically use sample data from a current partner, and will need to display it to another partner. So long as the output shows random names, no PII and cant be tied back- doesnt matter how it is displayed (cant be blank/null etc. needs to show as if it is their data) – user8834780 Jan 29 '18 at 16:39
-
see related: https://stackoverflow.com/questions/18319101/whats-the-best-way-to-generate-random-strings-of-a-specific-length-in-python however, it depends how many records we're talking here – EdChum Jan 29 '18 at 16:41
-
or perhaps this one: https://stackoverflow.com/questions/2257441/random-string-generation-with-upper-case-letters-and-digits-in-python or https://stackoverflow.com/a/23728630/2213647 – EdChum Jan 29 '18 at 16:44
-
@EdChum Ah this should work. Thank you! – user8834780 Jan 29 '18 at 16:51