1

I am trying to learn how the shebang works, and wondering what happens when you type node at the terminal prompt. Wondering if it calls main in one of the c functions somewhere. I understand only the basics, having been using node for a while. The shebang #!/usr/bin/env node somehow reads the node executable, which I'm not sure what or where it is, and what it starts at. Then there is the code that actually evaluates the expression node and directs it to the shebang, but that's probably too complicated to ask about.

Lance
  • 75,200
  • 93
  • 289
  • 503
  • 1
    It depends heavily on the OS and type of executable. Is this GNU/Linux? No shebang is generally involved when you type `node` at the terminal. Is your question "How does the shell run an executable like `node`?" or "How does the shell run a script with a shebang like `#!/usr/bin/env node`?" – that other guy Jul 30 '18 at 21:29
  • Ok then yeah just knowing what happens when I type `node`, where it first hits the line of code in the c library, if at all. – Lance Jul 30 '18 at 21:45

1 Answers1

10

To be excruciatingly precise, "the shebang" is just the two characters #!. When a file on a Unix system is invoked as an executable (ultimately via the system call execve), the kernel looks at its first few bytes to determine what kind of executable it is. If those bytes identify it as containing machine code, then the kernel will load the machine code into memory and cause the CPU to begin executing it. If the machine code was compiled from a C program, its main function will eventually get called. (If you want to know how this process works, read the book "Linkers and Loaders" by John Levine.)

But if the first two bytes are # and ! (ASCII values 35 and 33) then the kernel will instead scan the first line of the file for the name of an interpreter, and then it will run the interpreter instead, supplying the name of the #! program as a command-line argument. (See this answer for the exact details of how the first line is parsed by the kernel.) If you do

./foo.js a b c d

and foo.js begins with #! /usr/bin/node, then the kernel will behave as if execve had been called with the argument vector

/usr/bin/node ./foo.js a b c d

and it will open up the file /usr/bin/node, discover that that is a machine-code executable, and proceed to load the machine code, which is the Node interpreter, and run it. Node's main function will then notice that its first argument is ./foo.js, and it will open that file and execute it as a Javascript program, rather than going into its interactive read-evaluate-print loop.

The Node interpreter itself ignores the the #! line—but it has to have code in its parser to ignore it; the kernel doesn't filter it out. With many of the interpreted languages commonly used on Unix (sh, awk, perl, python, ruby, ...) comments run from a # to the end of the line, so this happens automatically; in fact, the #! notation was chosen, back in the day, because sh comments were from # to the end of the line. Javascript comments don't work like that, so Node has to have a special case for #! at the beginning of the file.


The #! line you showed has an additional level of indirection: #! /usr/bin/env node makes the kernel run the program /usr/bin/env (which is again made of machine code) with the argument vector

/usr/bin/env node ./foo.js a b c d

env then sees that its first argument is node and it looks along the search path for executables for a program named node. The search path is defined by an environment variable: type

echo $PATH

at your prompt to learn what it is. It's a colon-separated list of directories. For instance, a common value for PATH is

/usr/local/bin:/usr/bin:/bin

which means to look in the directories /usr/local/bin, /usr/bin, and /bin, in that order, for programs; in other words, with that value of PATH, and the arguments above, env would first attempt to run

/usr/local/bin/node ./foo.js a b c d

and if that didn't work it would try /usr/bin/node and so on. This extra indirection is necessary if you don't know where the Node interpreter (or whatever) has been installed, because the kernel's #! processing will only accept an absolute pathname after #!; it won't do a PATH search for you. If you do know where node has been installed, it's better to write that pathname directly, so that your program's behavior doesn't depend on what the invoking user's PATH is (e.g. some Linux distributions used to use the name /usr/bin/node for a completely unrelated program, so if you had #! /usr/bin/env node and the user didn't have /opt/node-1.9/bin before /usr/bin on their PATH, hilarity would ensue).


The behavior I have described for execve and files beginning with #! is not specified by POSIX (it is mentioned on that page, but only in the non-normative RATIONALE section). However, it is consistent across all Unix-like operating systems you are likely to encounter nowadays. I don't know off the top of my head exactly how old it is.

zwol
  • 135,547
  • 38
  • 252
  • 361