Can I effectively use transpilation, code tokenzation/regeneration, a VM or a similar approach to guaranteeably sandbox/control code within a browser?

Question

It looks like I'm asking about a tricky problem that's been explored a lot over the past decades without a clear solution. I've seen Is It Possible to Sandbox JavaScript Running In the Browser? along with a few smaller questions, but all of them seem to be mislabeled - they all focus on sandboxing cookies and DOM access, and not JavaScript itself, which is what I'm trying to do; iframes or web workers don't sound exactly like what I'm looking for.

Architecturally, I'm exploring the pathological extreme: not only do I want full control of what functions get executed, so I can disallow access to arbitrary functions, DOM elements, the network, and so forth, I also really want to have control over execution scheduling so I can prevent evil or poorly-written scripts from consuming 100% CPU.

Here are two approaches I've come up with as I've thought about this. I realize I'm only going to perfectly nail two out of fast, introspected and safe, but I want to get as close to all three as I can.

Idea 1: Put everything inside a VM

While it wouldn't present a JS "front", perhaps the simplest and most architecturally elegant solution to my problem could be a tiny, lightweight virtual machine. Actual performance wouldn't be great, but I'd have full introspection into what's being executed, and I'd be able to run eval inside the VM and not at the JS level, preventing potentially malicious code from ever encountering the browser.

Idea 2: Transpilation

First of all, I've had a look at Google Caja, but I'm looking for a solution itself written in JS so that the compilation/processing stage can happen in the browser without me needing to download/run anything else.

I'm very curious about the various transpilers (TypeScript, CoffeeScript, this gigantic list, etc) - if these languages perform full tokenization->AST->code generation that would make them excellent "code firewalls" that could be used to filter function/DOM/etc accesses at compile time, meaning I get my performance back!

My main concern with transpilation is whether there are any attacks that could be used to generate the kind code I'm trying to block. These languages' parsers weren't written with security in mind, after all. This was my motivation behind using a VM.

This technique would also mean I lose execution introspection. Perhaps I could run the code inside one or more web workers and "ping" the workers every second or so, killing off workers that [have presumably gotten stuck in an infinite loop and] don't respond. That could work.

Javascript debuggers are integrated in browsers, if you have to **guarantee** that the client state isn't tampered with - you'll have to put ***everything*** on the server (and hire guards, dogs and/or ninjas as the budget allows to protect the server). — Elliott Frisch, Mar 14 '17 at 02:34
I so wish I could fire up the debugger within HTML5! However, then I need to sandbox the code that accesses the debugger, and... I'm back where I started. I'm interested in allowing users to add interactivity to webpages in such a way that a predictable set of unsafe/time-wasting functionality (DOM access, cookie-stealing, network spamming, etc) is restricted. Sorry for not clarifying that (although now I'm sad I can't implement the techniques you suggested to protect the server :P) — i336_, Mar 14 '17 at 02:46
No, a transpiler won't help you here (unless it transpiles an esoteric safe language into safe javascript). JavaScript is too dynamic to statically prove a script secure without sacrificing usability. — Bergi, Mar 14 '17 at 02:49
@Bergi: Right, I was afraid the JS-flavor transpilers didn't have (for of a better way to say it) "full code coverage". Now I'm wondering what kind of safe language (which doesn't have to be *too* esoteric) I could compile from. — i336_, Mar 14 '17 at 03:01
I was thinking of purely functional languages (like Haskell), but it really depends on what you need this dynamically generated code to do. — Bergi, Mar 14 '17 at 03:19
@Bergi: Huh, that sounds kind of interesting. I wanted to essentially create a type of generic sandbox environment that would allow untrusted users to add small amounts of interactivity or intelligence to for example messages in a forum thread. I haven't pinned down exactly what the context would be just yet (I know there are 1000 ways I could design something like this incorrectly); I'm still at the "that sounds like a cool idea" stage and I want to prove that this wouldn't be a technical headache before I go ahead. (cont) — i336_, Mar 14 '17 at 05:40
@Bergi: (cont'd) I can definitely see how a functional language would provide code security guarantees, but that might be difficult for casual/hobbyist coders to wrap their heads around (speaking from my own experience; I'm not scared of Haskell myself per se, but that's kind of because I haven't fully tackled it head-on yet!). So being able to integrate a "dumber" language - one in the class of Java or PHP - would provide for much more general accessibility. I was initially wondering whether TypeScript's static analysis could come in handy here. — i336_, Mar 14 '17 at 05:44
You should have a look at [elm](http://elm-lang.org/) then. You can embed multiple widgets in your page and be sure that they won't affect anything else than their own area. You might need to whitelist the I/O features however (to exclude ajax etc). — Bergi, Mar 14 '17 at 15:34

Can I effectively use transpilation, code tokenzation/regeneration, a VM or a similar approach to guaranteeably sandbox/control code within a browser?

0 Answers0