69

In a script you must include a #! on the first line followed by the path to the program that will execute the script (e.g.: sh, perl).

As far as I know, the # character denotes the start of a comment and that line is supposed to be ignored by the program executing the script. It would seem, that this first line is at some point read by something in order for the script to be executed by the proper program.

Could somebody please shed more light on the workings of the #!?

I'm really curious about this, so the more in-depth the answer the better.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
mocybin
  • 997
  • 1
  • 8
  • 11
  • 1
    I got a good schooling on this topic in the comp.lang.shell thread [executable postscript programs](http://groups.google.com/group/comp.unix.shell/browse_thread/thread/e7a3306342c01847/ec5741ed3278408a?q=executable+postscript+programs#ec5741ed3278408a) By writing a simple C program to manipulate the command-line, I was able to make executable scripts for a language that normally doesn't do that. – luser droog Oct 06 '11 at 22:36

3 Answers3

72

Recommended reading:

The unix kernel's program loader is responsible for doing this. When exec() is called, it asks the kernel to load the program from the file at its argument. It will then check the first 16 bits of the file to see what executable format it has. If it finds that these bits are #! it will use the rest of the first line of the file to find which program it should launch, and it provides the name of the file it was trying to launch (the script) as the last argument to the interpreter program.

The interpreter then runs as normal, and treats the #! as a comment line.

Kevin Panko
  • 8,356
  • 19
  • 50
  • 61
  • The best thing is the file can be anything, not necessarily a program - as long as the invoked program can tolerate the shebang. – ivan_pozdeev Jul 15 '14 at 13:14
  • 2
    @KevinPanko: Is it always exactly 16 bits that are checked by the kernel's program loader? What would then happen if `#!` were preceded by a UTF-8 or UTF-16 BOM? – stakx - no longer contributing Jan 22 '17 at 17:17
  • 4
    @stakx Yes, this was invented before Unicode and has not been changed since then. http://unicode.org/faq/utf_bom.html#bom5 – Kevin Panko Jan 23 '17 at 21:07
31

The Linux kernel exec system call uses the initial bytes #! to identify file type

When you do on bash:

./something

on Linux, this calls the exec system call with the path ./something.

This line gets called in the kernel on the file passed to exec: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_script.c#L25

if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))

It reads the very first bytes of the file, and compares them to #!.

If the comparison is true, then the rest of the line is parsed by the Linux kernel, which makes another exec call with path /usr/bin/env python and current file as the first argument:

/usr/bin/env python /path/to/script.py

and this works for any scripting language that uses # as a comment character.

And yes, you can make an infinite loop with:

printf '#!/a\n' | sudo tee /a
sudo chmod +x /a
/a

Bash recognizes the error:

-bash: /a: /a: bad interpreter: Too many levels of symbolic links

#! is human readable, but that is not necessary.

If the file started with different bytes, then the exec system call would use a different handler. The other most important built-in handler is for ELF executable files: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_elf.c#L1305 which checks for bytes 7f 45 4c 46 (which also happens to be human readable for .ELF). Let's confirm that by reading the 4 first bytes of /bin/ls, which is an ELF executable:

head -c 4 "$(which ls)" | hd 

output:

00000000  7f 45 4c 46                                       |.ELF|
00000004                                                                 

So when the kernel sees those bytes, it takes the ELF file, puts it into memory correctly, and starts a new process with it. See also: How does kernel get an executable binary file running under linux?

Finally, you can add your own shebang handlers with the binfmt_misc mechanism. For example, you can add a custom handler for .jar files. This mechanism even supports handlers by file extension. Another application is to transparently run executables of a different architecture with QEMU.

I don't think POSIX specifies shebangs however: https://unix.stackexchange.com/a/346214/32558 , although it does mention in on rationale sections, and in the form "if executable scripts are supported by the system something may happen". macOS and FreeBSD also seem to implement it however.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • Why do people write the whole path to, for example, python? Shouldn't `#!python3` be just as good as `#!/usr/bin/env python3`? –  Aug 14 '20 at 17:41
  • 2
    @Daniel there isn't a fundamental reason why the kernel can't do `PATH` parsing and directory search like `env` does AFAIK. One possible rationale is that "if something can be done from userland, let it be done from userland" to reduce the kernel's surface of attack: https://unix.stackexchange.com/questions/11907/shebang-and-path – Ciro Santilli OurBigBook.com Aug 14 '20 at 17:48
12

Short story: The shebang (#!) line is read by the shell (e.g. sh, bash, etc.) the operating system's program loader. While it formally looks like a comment, the fact that it's the very first two bytes of a file marks the whole file as a text file and as a script. The script will be passed to the executable mentioned on the first line after the shebang. Voilà!


Slightly longer story: Imagine you have your script, foo.sh, with the executable bit (x) set. This file contains e.g. the following:

#!/bin/sh

# some script commands follow...:
# *snip*

Now, on your shell, you type:

> ./foo.sh

Edit: Please also read the comments below after or before you read the following! As it turns out, I was mistaken. It's apparently not the shell that passes the script to the target interpreter, but the operating system (kernel) itself.

Remember that you type this inside the shell process (let's assume this is the program /bin/sh). Therefore, that input will have to be processed by that program. It interprets this line as a command, since it discovers that the very first thing entered on the line is the name of a file that actually exists and which has the executable bit(s) set.

/bin/sh then starts reading the file's contents and discovers the shebang (#!) right at the very beginning of the file. To the shell, this is a token ("magic number") by which it knows that the file contains a script.

Now, how does it know which programming language the script is written it? After all, you can execute Bash scripts, Perl scripts, Python scripts, ... All the shell knows so far is that it is looking at a script file (which is not a binary file, but a text file). Thus it reads the next input up to the first line break (which will result in /bin/sh, compare with the above). This is the interpreter to which the script will be passed for execution. (In this particular case, the target interpreter is the shell itself, so it doesn't have to invoke a new shell for the script; it simply processes the rest of the script file itself.)

If the script was destined for e.g. /bin/perl, all that the Perl interpreter would (optionally) have to do is look whether the shebang line really mentions the Perl interpreter. If not, the Perl interpreter would know that it cannot execute this script. If indeed the Perl interpreter is mentioned in the shebang line, it reads the rest of the script file and executes it.

stakx - no longer contributing
  • 83,039
  • 20
  • 168
  • 268
  • 4
    The first two bytes of an executable are the magic number that indicates how it should be executed; for interpreted scripts, the first two bytes conveniently correspond to the ASCII chars `#!` – friedo Jun 09 '10 at 19:44
  • 7
    It's not the shell that's looking at those two bytes, it's the system (program loader), yes? The same thing happens whether you're running the script from within a shell or not. – Cascabel Jun 09 '10 at 19:45
  • 4
    The shebang is not handled by the shell, it's handled by the OS itself. – R Samuel Klatchko Jun 09 '10 at 19:46
  • 1
    Thanks for the corrections, I actually didn't know that. I have edited my answer accordingly. I decided to not delete my answer because I feel it can still help to understand what has to go on until a script ends up with the right interpreter; whether the necessary steps are taken by the shell or by the kernel itself appears to be only secondary to understanding. – stakx - no longer contributing Jun 09 '10 at 19:52
  • 1
    AFAIK The first os to adopt this was 4BSD – mathk Jun 10 '10 at 08:06