0

I have a certain SQL table which needs to be anonymized before displaying output to a partner. So I use pd.read_sql() to connect to the table and convert it to a DataFrame, however, how do I change client_name (in place) into a combination of 5 randomly generated letters? Thank you

user8834780
  • 1,620
  • 3
  • 21
  • 48
  • Does it have to be 5 random letters? Have you considering using a hash function instead (plus salt)? That way you can ensure that the `client_name` will always hash to the same seemingly random value. – pault Jan 29 '18 at 16:37
  • If you don't need to link back to the original name (if you hashed them), why not just set all the names to `'anon'` – EdChum Jan 29 '18 at 16:38
  • I basically use sample data from a current partner, and will need to display it to another partner. So long as the output shows random names, no PII and cant be tied back- doesnt matter how it is displayed (cant be blank/null etc. needs to show as if it is their data) – user8834780 Jan 29 '18 at 16:39
  • see related: https://stackoverflow.com/questions/18319101/whats-the-best-way-to-generate-random-strings-of-a-specific-length-in-python however, it depends how many records we're talking here – EdChum Jan 29 '18 at 16:41
  • or perhaps this one: https://stackoverflow.com/questions/2257441/random-string-generation-with-upper-case-letters-and-digits-in-python or https://stackoverflow.com/a/23728630/2213647 – EdChum Jan 29 '18 at 16:44
  • @EdChum Ah this should work. Thank you! – user8834780 Jan 29 '18 at 16:51

0 Answers0