3

I am working in a little project in R and I need to mask or encrypt the names in a variable in a data frame. I have the next structure for my data frame:

Name                Value.R
Bank of Italy         200
Josh Peters           300
Fist Bank of Americas 500
Neil Rodes            520
Oil Team World        700

I am looking for a way to protect the names in name variable, something like this:

Name                Value.R
BXXk of IXXXy         200
JXXh PXXXrs           300
FXXt BXXk of AmXXXcas 500
NXXl RXXes            520
OXl TXXm WXXld        700

I don't if it is possible to make in R. Thanks for your help.

Duck
  • 39,058
  • 13
  • 42
  • 84
  • 1
    digest package may of interest. Also, a highly related question http://stackoverflow.com/questions/5806308/how-do-i-encrypt-data-in-r – sckott Nov 04 '13 at 21:07
  • Is that example of encryption really good enough for your standards? – Dason Nov 04 '13 at 21:11
  • 1
    In particular, following up on @ScottChamberlain: `library(digest); digest("Bank of Italy","crc32")` gives `"8e7332c5"` (other hashes are cryptographically superior but longer) – Ben Bolker Nov 04 '13 at 21:13
  • Well, which is it: encrypt or mask? They're radically different. Your "something like this" names are trivial to decode as written, regardless of the length or content of the "xxx" you show. – Carl Witthoft Nov 04 '13 at 21:15
  • Thanks for your answers I made a complex analysis and I want to hide names, digest is working fine @BenBolker – Duck Nov 04 '13 at 21:17
  • 4
    +1 for Fist Bank of Americas – Thomas Nov 04 '13 at 21:20
  • 1
    @Duck : once a few hours have elapsed you can (and are encouraged to) post your own solution to this question. – Ben Bolker Nov 04 '13 at 21:24
  • Dear @BenBolker I used `digest` function creating a new variable in my Data frame but I got the same value for all elements simply I put `DF$Name.Encrypt=digest(DF$Name,"crc32")` and all names that were different got the same code, could I solve it. – Duck Nov 04 '13 at 21:32
  • 3
    `DF$Name.Encrypt=sapply(DF$Name,digest,"crc32")` or `DF=transform(DF,Name.Encrypt=sapply(Name,digest,"crc32"))` – Ben Bolker Nov 04 '13 at 21:38
  • Awesome man @BenBolker – Duck Nov 04 '13 at 22:01

1 Answers1

2

This is one option that gets close to what you show:

x <- c('Bank of Italy', 'First Bank of Americas')
gsub('([A-Z])([a-z]+)([a-z])', '\\1X\\3', x)
# [1] "BXk of IXy"     "FXt BXk of AXs"

If your expectation of obfuscation isn't too high (which it doesn't seem to be), you could also use abbreviate:

x <- c("Bank of Italy",
"Josh Peters",
"Fist Bank of Americas",
"Neil Rodes",
"Oil Team World")
abbreviate(x)
# [1] "BnoI" "JshP" "FBoA" "NlRd" "OlTW"
Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113