138

I'd like to execute an gawk script with --re-interval using a shebang. The "naive" approach of

#!/usr/bin/gawk --re-interval -f
... awk script goes here

does not work, since gawk is called with the first argument "--re-interval -f" (not splitted around the whitespace), which it does not understand. Is there a workaround for that?

Of course you can either not call gawk directly but wrap it into a shell script that splits the first argument, or make a shell script that then calls gawk and put the script into another file, but I was wondering if there was some way to do this within one file.

The behaviour of shebang lines differs from system to system - at least in Cygwin it does not split the arguments by whitespaces. I just care about how to do it on a system that behaves like that; the script is not meant to be portable.

sanmai
  • 29,083
  • 12
  • 64
  • 76
Dr. Hans-Peter Störr
  • 25,298
  • 30
  • 102
  • 139
  • 2
    A silly experiment I just did was with one script using another script on the shebang line, which did split the arguments correctly. – Hasturkun Nov 29 '10 at 12:40
  • 1
    @Hasturkun, that raises another issue, that the behavior of shebang lines also differs from system to system wrt whether the invoked program can itself be a script. – dubiousjim Apr 19 '12 at 15:05
  • 2
    http://stackoverflow.com/questions/17458528/why-does-this-snippet-work – Josh Lee Oct 12 '13 at 23:06
  • With recent versions of gawk (>= 4.0), `--re-interval` is not needed anymore (see [https://www.gnu.org/software/gawk/manual/gawk.html#index-_002d_002dre_002dinterval-option]). –  Mar 03 '18 at 20:56

10 Answers10

163

The shebang line has never been specified as part of POSIX, SUS, LSB or any other specification. AFAIK, it hasn't even been properly documented.

There is a rough consensus about what it does: take everything between the ! and the \n and exec it. The assumption is that everything between the ! and the \n is a full absolute path to the interpreter. There is no consensus about what happens if it contains whitespace.

  1. Some operating systems simply treat the entire thing as the path. After all, in most operating systems, whitespace or dashes are legal in a path.
  2. Some operating systems split at whitespace and treat the first part as the path to the interpreter and the rest as individual arguments.
  3. Some operating systems split at the first whitespace and treat the front part as the path to the interpeter and the rest as a single argument (which is what you are seeing).
  4. Some even don't support shebang lines at all.

Thankfully, 1. and 4. seem to have died out, but 3. is pretty widespread, so you simply cannot rely on being able to pass more than one argument.

And since the location of commands is also not specified in POSIX or SUS, you generally use up that single argument by passing the executable's name to env so that it can determine the executable's location; e.g.:

#!/usr/bin/env gawk

[Obviously, this still assumes a particular path for env, but there are only very few systems where it lives in /bin, so this is generally safe. The location of env is a lot more standardized than the location of gawk or even worse something like python or ruby or spidermonkey.]

Which means that you cannot actually use any arguments at all.

mklement0
  • 382,024
  • 64
  • 607
  • 775
Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • Thank you very much for the insightful comment! But in this case I don't care about portability, so I clarified my question about that. – Dr. Hans-Peter Störr Nov 30 '10 at 09:46
  • 3
    FreeBSD's env has a `-S` switch which helps here, but it's not present on my Linux `env`, and I suspect isn't available on gygwin, either. @hstoerr, other users with different situations may be reading your questions later, so in general portable answers are preferable, even if you don't now require portability. – dubiousjim Apr 19 '12 at 15:11
  • 4
    So we can’t portably use arguments in a shebang. But what if we need arguments by any means necessary? I’m guessing that the solution is to write a wrapper shell script containing `#!/bin/sh` and `/usr/bin/env gawk --re-interval -f my-script.awk`. Is that correct? – Rory O'Kane Jul 05 '12 at 14:24
  • 1
    Sounds like that `-S` option would be useful to have added to the GNU coreutils version of `env`?? And/or the linux kernel shebang behavior modified to be like option 2... though I presume there would be compatibility concerns with the latter change. – lindes Dec 24 '13 at 05:42
  • 1
    I do not agree. You can quite portably use one argument. Any system where you cannot use any arguments fails miserably to implement this traditional Unixism, which is what hash-bang is. If non-implementations are fair game, then we can safely say that `#!` itself is not portable. For instance, Windows doesn't recognize this convention "natively" at all. A one-argument has bang is needed on Unix traditionally to be able to do `#!/usr/bin/awk -f`. – Kaz Jun 09 '14 at 18:08
  • 7
    @Kaz: Yes, but since the paths of many binaries are not standardized, you use up your one argument for `#!/usr/bin/env ruby` or the likes. – Jörg W Mittag Jun 09 '14 at 19:35
  • 1
    I would just like to add, for completeness, this quote: [If the shebang line is `#!/bin/bash -ex`, it is equivalent to executing `/bin/bash -ex /path/too/foo arg1 arg2`. This feature is managed by the kernel.](http://superuser.com/questions/195826/bash-shebang-for-dummies/195834#195834) – sdaau Jul 03 '14 at 11:39
  • @JörgWMittag, Re your last para, so **What's the solution?** – Pacerier Aug 15 '17 at 23:15
  • 4
    @Pacerier: Change the POSIX specification and wait 20-30 years until all systems have been updated to be compliant with the spec. – Jörg W Mittag Aug 16 '17 at 06:32
  • Can you point out an example of an operating system that picked option 2? I've only ever seen option 3 in live systems. I've seen a number of things document option 3 in ways that suggest that they thought it was standardized, but I've never found a formal reference, so. – Seebs Jan 24 '20 at 21:48
  • @Seebs: I read somewhere but can't remember where that you should neither rely on the OS splitting at whitespace and passing multiple args nor rely on the OS *not* splitting at whitespace and passing a single arg. The problem with writing portable code based on assumptions of non-existence of a certain interpretation of some spec or custom is that there are an awful lot of OSs out there with an awful lot of different interpretations, and there is no guarantee that somebody will not write an OS that picks the most ridiculous interpretations out of spite. – Jörg W Mittag Jan 25 '20 at 08:17
  • [GNU coreutils v8.30+](https://savannah.gnu.org/forum/forum.php?forum_id=9187) - included with Ubuntu 19.04+, for instance, now come with an `env` version that supports `-S`, so something like `#!/usr/bin/env -S pwsh -noprofile` should then work both on such Linux distros as well as macOS. – mklement0 Mar 25 '21 at 20:50
50

Although not exactly portable, starting with coreutils 8.30 and according to its documentation you will be able to use:

#!/usr/bin/env -S command arg1 arg2 ...

So given:

$ cat test.sh
#!/usr/bin/env -S showargs here 'is another' long arg -e "this and that " too

you will get:

% ./test.sh 
$0 is '/usr/local/bin/showargs'
$1 is 'here'
$2 is 'is another'
$3 is 'long'
$4 is 'arg'
$5 is '-e'
$6 is 'this and that '
$7 is 'too'
$8 is './test.sh'

and in case you are curious showargs is:

#!/usr/bin/env sh
echo "\$0 is '$0'"

i=1
for arg in "$@"; do
    echo "\$$i is '$arg'"
    i=$((i+1))
done

Original answer here.

unode
  • 9,321
  • 4
  • 33
  • 44
  • 1
    FYI, FreeBSD has had -S for years (since 6.0). This is a welcome portability addition to coreutils. – Juan Jan 18 '19 at 21:14
27

This seems to work for me with (g)awk.

#!/bin/sh
arbitrary_long_name==0 "exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$@"


# The real awk program starts here
{ print $0 }

Note the #! runs /bin/sh, so this script is first interpreted as a shell script.

At first, I simply tried "exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$@", but awk treated that as a command and printed out every line of input unconditionally. That is why I put in the arbitrary_long_name==0 - it's supposed to fail all the time. You could replace it with some gibberish string. Basically, I was looking for a false-condition in awk that would not adversely affect the shell script.

In the shell script, the arbitrary_long_name==0 defines a variable called arbitrary_long_name and sets it equal to =0.

Aaron McDaid
  • 26,501
  • 9
  • 66
  • 88
  • 1
    This is my answer, but I wonder if it's sufficiently portable and robust. Does it depend specifically on the `bash`, or will it work with any POSIX `sh`? And I don't use `awk` often, so I'm not sure my trick on the second line is a good way to force `awk` to ignore the line. – Aaron McDaid Sep 26 '14 at 09:49
  • 1
    Just what I was wondering, +1, but probably inadvisable (hence the relative votes). – Russia Must Remove Putin Apr 06 '16 at 12:42
  • Can you explain what problems this might have, @AaronHall ? As long as the variable `arbitrary_long_name` doesn't clash with a variable used in the real awk program, I can't see any issue. Is there something I'm missing? – Aaron McDaid Apr 07 '16 at 09:07
  • Use `#!/bin/sh -` instead of `#!/bin/sh` to protect the script from possibly misbehaving in a dangerous way if invoked with a zeroth argument that has `-` as the first character. This can happen accidentally in programming languages like C, where it is easy to accidentally mess up by forgetting to pass the invoked program name as part of the argument array to `execve` and similar functions, and if people habitually forget to protect against it, it can also end up being the last step in a maliciously exploitable vulnerability that lets an attacker get an interactive shell. – mtraceur May 11 '20 at 19:41
13

I came across the same issue, with no apparent solution because of the way the whitespaces are dealt with in a shebang (at least on Linux).

However, you can pass several options in a shebang, as long as they are short options and they can be concatenated (the GNU way).

For example, you can not have

#!/usr/bin/foo -i -f

but you can have

#!/usr/bin/foo -if

Obviously, that only works when the options have short equivalents and take no arguments.

raphink
  • 3,625
  • 1
  • 28
  • 39
11

Under Cygwin and Linux everything after the path of the shebang gets parsed to the program as one argument.

It's possible to hack around this by using another awk script inside the shebang:

#!/usr/bin/gawk {system("/usr/bin/gawk --re-interval -f " FILENAME); exit}

This will execute {system("/usr/bin/gawk --re-interval -f " FILENAME); exit} in awk.
And this will execute /usr/bin/gawk --re-interval -f path/to/your/script.awk in your systems shell.

Moritz
  • 432
  • 5
  • 15
4
#!/bin/sh
''':'
exec YourProg -some_options "$0" "$@"
'''

The above shell shebang trick is more portable than /usr/bin/env.

bfontaine
  • 18,169
  • 13
  • 73
  • 107
user3123730
  • 117
  • 1
  • 1
  • The ''':' is a hold-over because my original solution was for a python script so the ''':' tells the python interpreter to ignore the exec part. – user3123730 Jan 10 '14 at 19:13
  • 4
    I think you're being downvoted because your solution is for `python`, but this question is about `awk`. – Aaron McDaid Sep 26 '14 at 09:42
  • 1
    Great hack for python. – Zaar Hai Aug 15 '17 at 10:50
  • what was the purpose of the single quotes? `''':' and '''` – Walid Jan 08 '21 at 07:59
  • 1
    @Walid – In python (but not awk and not posix shell!), triple apostrophes can be used for a string without having to worry about quotes or apostrophes inside it (unless there are 3+ consecutive apostrophes). – Adam Katz Sep 02 '21 at 18:59
3

In the gawk manual (http://www.gnu.org/manual/gawk/gawk.html), the end of section 1.14 note that you should only use a single argument when running gawk from a shebang line. It says that the OS will treat everything after the path to gawk as a single argument. Perhaps there is another way to specify the --re-interval option? Perhaps your script can reference your shell in the shebang line, run gawk as a command, and include the text of your script as a "here document".

bta
  • 43,959
  • 6
  • 69
  • 99
  • It seems there is no other way to specify the option. You are right: gawk -f - < – Dr. Hans-Peter Störr Nov 30 '10 at 09:42
  • The here document eats up the standard input stream for `gawk`, but you may still be able to pipe something in over stderr (that is, redirect stdout to stderr before piping into this script). I've never actually tried that but as long as the first process doesn't emit anything on stderr, it might work. You can also create a named pipe (http://www.linuxjournal.com/content/using-named-pipes-fifos-bash) if you want to make sure that nothing else is using it. – bta Dec 01 '10 at 01:52
3

Why not use bash and gawk itself, to skip past shebang, read the script, and pass it as a file to a second instance of gawk [--with-whatever-number-of-params-you-need]?

#!/bin/bash
gawk --re-interval -f <(gawk 'NR>3' $0 )
exit
{
  print "Program body goes here"
  print $1
}

(-the same could naturally also be accomplished with e.g. sed or tail, but I think there's some kind of beauty depending only on bash and gawk itself;)

conny
  • 9,973
  • 6
  • 38
  • 47
0

Just for fun: there is the following quite weird solution that reroutes stdin and the program through file descriptors 3 and 4. You could also create a temporary file for the script.

#!/bin/bash
exec 3>&0
exec <<-EOF 4>&0
BEGIN {print "HALLO"}
{print \$1}
EOF
gawk --re-interval -f <(cat 0>&4) 0>&3

One thing is annoying about this: the shell does variable expansion on the script, so you have to quote every $ (as done in the second line of the script) and probably more than that.

Dr. Hans-Peter Störr
  • 25,298
  • 30
  • 102
  • 139
-1

For a portable solution, use awk rather than gawk, invoke the standard BOURNE shell (/bin/sh) with your shebang, and invoke awk directly, passing the program on the command line as a here document rather than via stdin:

#!/bin/sh
gawk --re-interval <<<EOF
PROGRAM HERE
EOF

Note: no -f argument to awk. That leaves stdin available for awk to read input from. Assuming you have gawk installed and on your PATH, that achieves everything I think you were trying to do with your original example (assuming you wanted the file content to be the awk script and not the input, which I think your shebang approach would have treated it as).

lharper71
  • 111
  • 1
  • 6