16

I have a bunch of R scripts which I am running on a Windows machine and want to ensure that the code remains unread by those not intended to see it. On a Linux box, I could wrap the R code in a bash script #! and make an encrypted (and perhaps even a limited-life) executable shell script. What are my options to do something on similar lines under Windows?

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
Vishal Belsare
  • 464
  • 1
  • 5
  • 14
  • 29
    We cherish R as an open source system giving everybody the opportunity to study the source code. – Dirk Eddelbuettel Jan 16 '11 at 18:49
  • Dirk, and I was anticipating something exactly on these lines and quite likely from you :-) I'd quite gladly keep my code open, except that in my immediate situation I do need to keep prying eyes out of the loop. Your point however, is much appreciated and indeed open source is cherished. – Vishal Belsare Jan 16 '11 at 18:52
  • @Vishal You are unlikely to get much help for this request. What's more, any solution you come up with won't ever stop prying eyes. – David Heffernan Jan 16 '11 at 19:13
  • @Vishal Even your so-called Linux solution sounds pretty easy to defeat – David Heffernan Jan 16 '11 at 19:17
  • @David: I am not looking for an airtight seal around code.. Also, in a way, I am curious about whether this is doable under Windows. And yes, I don't think that an encrypted hash-bang is a lot of work to open, but I am not close to facing an assault from an army of determined crackers. Far from it. – Vishal Belsare Jan 16 '11 at 19:33
  • 1
    @Vishal It would be trivial to replicate what you have in Linux with Windows. – David Heffernan Jan 16 '11 at 19:38
  • 6
    @Vishal I would even show you how to do it, but I'd have to encrypt my answer...... – David Heffernan Jan 16 '11 at 20:12
  • 1
    @David, thanks. I see some Smullyan-esque humor there. – Vishal Belsare Jan 16 '11 at 20:19
  • 9
    I suggest you rot-13 encode your R and read it into a modified R interpreter that decodes all its input. For extra security, rot-13 encode everything twice. – Spacedman Jan 16 '11 at 20:21
  • 4
    @Fcnprqzna, gunax lbh. Guvf vf oevyyvnag! – Vishal Belsare Jan 16 '11 at 20:24
  • 1
    @Spacedman: But then, wouldn't the GPL require you to make the source for your modified R interpreter available to your clients? :P – Sharpie Jan 16 '11 at 20:28
  • @Sharpie I think Spacedman was trying not to let the mundane details of licensing get in the way of his humour. Kind of spoils the punchline: "For extra security, rot-13 encode everything twice, but remember to make available the source code of your modified interpreter to comply with the GPL". – David Heffernan Jan 16 '11 at 20:30
  • 4
    I really like rot-13 twice for 'extra security' suggestion. – Vishal Belsare Jan 16 '11 at 20:31
  • @David Heffeman: Yes, yes, hence the grin on my face as I typed that :P – Sharpie Jan 16 '11 at 20:33
  • 1
    @Sharpie I'm too old to understand all these different smileys! Once you get beyond :-) and ;-) I'm lost. – David Heffernan Jan 16 '11 at 20:37
  • Are a virtual machine or cygwin options? – Richard Herron Jan 16 '11 at 20:46
  • @richardh I believe cygwin might help, but I have never worked with R installed under Cygwin. – Vishal Belsare Jan 16 '11 at 20:59
  • 1
    sounds like file permission issues more than R issues. If this must be ran by the user without allowing the user to see what is being ran it should probably not be ran. – Andrew Redd Jan 17 '11 at 03:34
  • This would be very useful in an education/teaching context (eg. to produce individualized simulations that students have to analyze or automatic self grading), in many situations were blinds are required (eg. single blind or double blind analysis of data were the script would be used for labelling)... – Etienne Low-Décarie Apr 21 '12 at 14:42

3 Answers3

21

My answer is a bit late, but I believe this is a good question. Unfortunately, I don't believe that there is a solution, or at least an easy one, at the present time.

The difficulty is common because, for most interpreted languages, including R, it is often possible to turn on logging and inspection of all commands being run. This can negate many tricks to obfuscate the code.

For those who prefer to think of code being open == good, one should know that a common reason to obfuscate the code is if one is consulting with a client that hires multiple vendors. It is not uncommon for a client to take scripts from vendor A and ask vendor B why it doesn't work with their system. (This may be done by a low-level IT flunkie, rather than someone responsible for the NDA contracts.) If A & B are competitors, A's code has just been handed to B. When scripts == serious programs, then serious code has been given away.

The ways I've seen this addressed are:

  1. Make a call to a compiled language, and use standard protections available there.
  2. Host the executable on a different server, and use calls to the server to execute the calculations. (In R, there are multiple server-side options.)
  3. Use compiled (preprocessed / bytecode) code within the language.

Option 2 is actually easier and better when the code may be widely distributed, not just for IP reasons. A major advantage is that it lets you upgrade the code without having to go through the pain of a site-wide release process. If new libraries are needed, no problem - update the server.

Option 3 is done in Matlab with .p files, and can be done with py2exe for Python on Windows. In R, the new bytecode compilation may be analogous, but I am not familiar enough with it to address any differences between .Rc files in the R context and .p files in the Matlab context. For more info on the compiler, see: http://www.inside-r.org/r-doc/compiler/compile

Hosting computations on the server is great for working with unsophisticated users, because it is easier to iterate quickly in response to bugs or feature requests. The IP protection is simply a benefit.

Iterator
  • 20,250
  • 12
  • 75
  • 111
  • 2
    +1, we find ourselves in this position quite frequently, this is where key documentation to theoretical references and appropriately labeled datasets come in handy. – Chase Apr 20 '12 at 23:34
5

This is not a specifically R-oriented strategy. (And it's a bit unclear what your constraints or goals really are anyway.) If you want a cross-platform encryption method, you should look into the open-source program TrueCrypt. It supports creating encrypted files that can be mounted as volumes on any machine that supports the volume formatting method. I have tested this across the Mac PC divide , since the Mac can read FAT files, but have no experience with how it might work across the Linux-PC chasm.

(Their TODO list for Windows includes;"Command line options for volume creation (already implemented in Linux and Mac OS X versions)". So I don't see any clear way to use this from within R without you running the program from the OS.)

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • This is the solution I've gone for as well. Install R and your packages in an encrypted volume. Alternatively use your normal R install and put the sensitive package in your local library directory which is on an encrypted volume. – Dr G Jan 17 '11 at 09:23
  • @DrG, Coluld you little bit explain your "use your normal R install and put the sensitive package in your local library directory which is on an encrypted volume" solution as well? – Erdogan CEVHER Apr 08 '15 at 22:38
1

I don't think this is possible because the R interpreter has to be able to decrypt and read the code in order to execute it which means that whoever is using that interpreter will also be able to decrypt and read the code.

I am by no means an expert, so I reserve the right to be 100% wrong about that statement.

I believe the best solution is to ensure value comes from the expertise and services provided by your company and it's employers---not from keeping secrets.

Failing that, you could try separating the code into a client/server model. That way the client just sends data and receives results---they never have access to the code that runs on the server.

However, the scientist in me just said "that solution sucks and I would never trust results provided under such conditions".

Sharpie
  • 17,323
  • 4
  • 44
  • 47