How to compile lexical environments into objects in JavaScript compiler?

Question

I am working on a custom language and want to now support lexical environments (closures). I am asking for help in how this would look in JavaScript to keep it easier and more applicable to others.

Basically, say you have this in a file.

let a = 10
let b = 20

function doFoo() {
  let x = 300
  let y = 400
  let z = doBar(x, y)
  return a * z + b * z

  function doBar(m, n) {
    return m * a + n * b
  }
}

Keeping it simple (not thinking about how compilers can optimize away or remove simple expressions), what do the objects look like for lexical environments (like in JSON), and how do the lexical environments get used at runtime generally?

It seems to me that you would end up like this:

let a = {
  vars: [
    {
      varname: 'a',
      value: 10
    },
    {
      varname: 'b',
      value: 20
    }
  ],
  funcs: ['doFoo']
}
let b = {
  parent: a,
  vars: [
    {
      varname: 'x',
      value: 300
    },
    {
      varname: 'y',
      value: 400
    },
    {
      varname: 'z',
      value: undefined
    }
  ],
  funcs: ['doBar']
}
let c = {
  parent: b,
  vars: [
    {
      varname: 'm',
      value: undefined
    },
    {
      varname: 'n',
      value: undefined
    }
  ]
}
let environments = [a, b, c, /* and tons more */]

Then when you actually invoke the function doFoo, it would serialize the data at that instant or something like that.

run(script)

// 1. link a = 10, b = 20
// 2. call doFoo
//   3. create a new object, link it to the lexical environment.
//   4. repeat

This "create a new object" part is where I'm lost. We must create a new "scope" or "context" for every invocation of the function correct? And how does it relate to that environments array I created?

It seems that a scope is an instance of an environment, where the variables of the environment are filled in right before they are used. Then an environment just says what variables there are.

But this seems like a performance problem, creating a new scope with potentially dozens of variables to serialize all at once. So they must update the scopes as the variables change. How does this generally look or work? What objects do we have, how are they linked together, and when/how do they get updated?

Did you read https://en.wikipedia.org/wiki/Closure_(computer_programming)#Implementation_and_theory? — Bergi, Jan 16 '21 at 02:28
"*We must create a new "scope" or "context" for every invocation of the function correct?*" - yes. "*it would serialize the data at that instant*" - no. As you wrote, it is only *referecened* (linked). "*how does it relate to that environments array I created?*" - not at all. Through the `parent` references, you already have a linked list, no array needed for anything. — Bergi, Jan 16 '21 at 02:31
Possible duplicate of [lightweight javascript to javascript parser](https://stackoverflow.com/q/6851869/1048572) (for title question) — Bergi, Jan 16 '21 at 02:32
Possible duplicate of [How do JavaScript closures work at a low level?](http://stackoverflow.com/q/31735129/1048572) (for body questions). Also relevant: [How JavaScript closures are garbage collected?](http://stackoverflow.com/q/19798803/1048572), [Where is the variable in the closure stored, stack or heap?](http://stackoverflow.com/q/29225834/1048572), [Javascript closures on heap or stack?](http://stackoverflow.com/q/16959342/1048572) and [Where does a JavaScript closure live?](https://stackoverflow.com/q/37491626/1048572) — Bergi, Jan 16 '21 at 02:35

PossiblyAShrub · Answer 1 · 2021-01-23T23:48:13.743

First we want to understand the difference between scopes and environments in theory.

Environments are the bindings that associate variables to values need to be stored somewhere. You can think of it like a map where the keys are variable names and the values. Updating values is as simple as changing the value that the key points to.

A scope defines a region in code where a name maps to a value. Multiple scopes enable the same name to refer to different things in different contexts.

So as you mentioned, scopes are essentially environments. But what happens when we have the following:

function defineVariable() {
    let a = "some variable";
}

defineVariable();
console.log(a); // undefined!

Here we have a function, and it contains it's own scope. This means that any variables defined after the opening brace { get deleted after the closing brace }. So in this above snippet we'd do the following:

call function defineVariable
create scope
create and assign variable a
close scope & delete all variables in previous scope: so delete a
log a to console (a is undefined, it was deleted so we log undefined)

So now we know that scopes fence variables between their braces {}. However we also have the case of shadowing. For example:

let shadowed = 0;

function printShadowed() {
  let shadowed = 1;
  console.log(shadowed); // 1
}

printShadowed(); // prints 1
console.log(shadowed); // 0

Here we do something interesting, we declare a new variable with the same name as an existing variable. This is called shadowing. It let's us create new variables that are entirely separate from their parenting environment. To better understand this we can talk about the difference between local and global variables:

let global = 10;

function doMath() {
  let local = 15;
  console.log(global + local); // 25
}

doMath(); // prints 15
console.log(local); // undefined, `local` is local to the `doMath` scope

The idea is that a scope has access to all of it's parents scopes, but doesn't have access to any children scopes.

To recap an environment is a map of variable names to their respective values. A program is a tree of different environments. A scope is a branch of those environments which are accessible. When we declare variables with the same name as a variable in it's parent environment it's called shadowing. To picture this visually, the following code is equivalent to:

let global = 10;

function doMath() {
  let local = 15;
  console.log(global + local); // 25
  console.log(a); // undefined, a is not in scope
}

function other() {
   let a = "...";
}

doMath(); // prints 15

Where the environments circled in red are the scope of the doMath function.

Now in practice this can be implemented in a variety of ways. The simplest is the idea of the stack.

let global = 10;
// environments = [ { "global": 10 } ]

function doMath() {
   let local  = 15;
   // environments = [ { "global": 10 }, { "local": 15 } ]

   console.log(global + local); // 25
   // we walk the list from back to front until we find
   // a map that contains the requested variable. So:
   //
   // for `local` we first check the last map, we find `local`
   // in that map, so our value is 15
   //
   // for `global` we first check the last map, we don't find
   // `global`, so we move up the list, we check the second last
   // map (here it's the first) and we find `global` in that map,
   // our value is 10
}
// exiting function scope
// environments = [ { "global": 10 } ]

doMath();

EDIT: I realized that I completely forgot to go over the closures part. So take the following example:

let x = "global";
function outer() {
  let x = "inner";
  function inner() {
    console.log(x);
  }
  return inner;
}
outer()();

In a stack based scope implementation this will print "global" when theoretically we would expect it to print "local". Now this is why I mentioned that scope creates a tree like structure. However with closures, we need to capture that scope in some way. What I mean by this is that we want the inner function to capture it's scope, thus from it's point of view x with the value of "local" should take precedence.

Let's go over what I mean by capturing and storing scope: So instead of representing our environments as a stack we will instead represent them as a tree. We can then keep track of the branch which we are currently on by pointing to that tree. By then climbing the tree we get all of the info needed for our standard stack implementation above. However, if we can see that a user has defined a closure, that closure is going to hold two things. The code which make's up the closure's body, and a pointer to where the closure's scope exists. So to use that new logic on the above example:

let x = "global";
// { "x": "global", children: [] }

function outer() {
  let x = "inner";
  // { "x": "global", children: [ { "x": "local", children: [] } ] }
  function inner() {
    console.log(x);
  }
  // inner is a closure, it stores the code within it
  // and a pointer to to the current scope that it was
  // instantiated in
  return inner;
}
// we are now pointing to the scope on the top of the tree, but notice how we
// don't delete the "local" x scope
// { "x": "global", children: [ { "x": "local", children: [] } ] }
outer()();
// good thing we didn't, because this function returned a function that
// needed it about when that local x should be deleted is a headache for
// our garbage collector

Now another case that you mentioned was what if we change the value of local x? Well that's why we create a pointer to the scope of the local x. So any changes to local x will be reflected when we call it. This also has the added bonus of avoiding unnecessary copying.

About memory, that's where things get a little hairy. Ideally we don't want our GC to have to work harder than it needs to. Many languages, like lua, will only store scope if there's a closure. What this means is that instead of storing a perfect tree of environments, it will only store a branch if it's captured by a closure, or is currently in use.

This was a lot, and it may be more helpful to look at other resources as well. The book that you linked is great as it goes into a lot of detail about language implementation in general. Also the links that @Bergi posted in his comments to this question could also be helpful. If I've missed something myself, or am wrong in any part of this answer, constructive criticism is always helpful :)

What's the difference between steps 4 ("close scope") and 5 ("delete variables")? — Bergi, Jan 23 '21 at 01:41
Scopes/environments do not form a stack - if they were following the call stack, we'd have dynamic scoping. For closures and lexical scoping to work, you need something different. — Bergi, Jan 23 '21 at 01:45
You are correct. I have added onto the question to address closures, and how we account for them. You can provide an answer too ;) I originally separated the steps as generally the act of saying that a variable isn't accessible and deleting it are two very different things. — PossiblyAShrub, Jan 23 '21 at 23:50

How to compile lexical environments into objects in JavaScript compiler?

1 Answers1