3

I like to define a new command that wraps an existing awk command, such as print. However, I do not want to use a function:

#wrap command with function
function warn(text) { print text > "/dev/stderr" }
NR%1e6 == 0 {
  warn("processed rows: "NR)
}

Instead, I like to define a new command that can be invoked without brackets:

#wrap command with new command ???
define warn rest... { print rest... > "/dev/stderr" }
NR%1e6 == 0 {
  warn "processed rows: "NR
}

One solution I can imagine is using a preprocessor and maybe setting up the shebang of the awk script nicely to invoke this preproccessor followed by awk. However, I was more hoping for a pure awk solution.

Note: The solution should also work in mawk, which I use, because it is much faster than vanilla GNU/awk.

Update: The discussion revealed that gawk (GNU/awk) can be quite fast and mawk is not required.

Juve
  • 10,584
  • 14
  • 63
  • 90
  • 1
    Given the simplicity of the function route and the, likely, relative complexity of the preprocess/etc. routes is this really worth it? – Etan Reisner Dec 17 '14 at 16:50
  • Maybe you are right, but I am always looking for (simple) ways to write more readable code. I am not aiming for a very hacky solution here. – Juve Dec 18 '14 at 08:36

3 Answers3

1

You cannot do this within any awk and you cannot do it robustly outside of awk without writing an awk language parser and by that point you may as well write your own awk-like command which then would actually no longer really be awk in as much as it would not behave the same as any other command by that name.

It is odd that you refer to GNU awk as "vanilla" when it has many more useful features than any other currently available awk while mawk is simply a stripped down awk optimized for speed which is only necessary in very rare circumstances.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Sorry about the vanilla. I did not know that "vanilla" could be a too negative term. However, I use AWK to process really huge csv files and `mawk` usually feels like 10 times faster than `gawk`. – Juve Dec 18 '14 at 08:32
  • @Juve *Feels* or *is*? – Etan Reisner Dec 18 '14 at 12:04
  • @Etan Processing 5M lines on my quad-core Macbook (2.5Ghz) takes 1.47 seconds with `mawk` and 9.6 seconds using `awk` (i.e., `mawk` is 6.5 times faster) – Juve Dec 18 '14 at 13:27
  • @Juve Ok. To be clear I wasn't doubting you. I was doubting "feels". – Etan Reisner Dec 18 '14 at 13:28
  • `vanilla` isnt particularly negative, but it means `basic` which is the complete opposite of what gawk is in the family of awks and that's why I questioned it. If you're on a Macbook, are you SURE you're using gawk and not BSD awk or similar? You might also simply be running a poorly designed script and so is slow on some awks due to that. – Ed Morton Dec 18 '14 at 14:26
  • 1
    Thanks for the hint Ed! I was using some poor Mac awk from 2007. I just installed `gawk`, which processes my sample file in 2.1 seconds. – Juve Dec 18 '14 at 15:59
  • Update: when processing a 17M line file, using another awk script, `mawk` is about 3x faster than `gawk`. – Juve Dec 19 '14 at 14:24
  • That MAY be a fair comparison but that result could be based on the constructs you're using in your script though. For example maybe you have a chain of `sub()s` and that runs faster in mawk, but in a gawk script you'd really use one `gensub()` and that'd run faster than a chain of `sub()s`, etc. Maybe you're using a bunch of match/substrs which could be written in gawk with the array arg for match. Without seeing the script we can't tell if a gawk script created to do the job would run faster or slower than that mawk script currently being executed by gawk. – Ed Morton Dec 19 '14 at 14:37
0

Looking at Mawk's source I see that commands are special and cannot be added at runtime. From kw.c:

keywords[] =
{
    { "print",    PRINT },
    { "printf",   PRINTF },
    { "do",       DO },
    { "while",    WHILE },
    { "for",      FOR },
    { "break",    BREAK },
    { "continue", CONTINUE },
    { "if",       IF },
    { "else",     ELSE },
    { "in",       IN },
    { "delete",   DELETE },
    { "split",    SPLIT },
    { "match",    MATCH_FUNC },
    { "BEGIN",    BEGIN },
    { "END",      END },
    { "exit",     EXIT },
    { "next",     NEXT },
    { "nextfile", NEXTFILE },
    { "return",   RETURN },
    { "getline",  GETLINE },
    { "sub",      SUB },
    { "gsub",     GSUB },
    { "function", FUNCTION },
    { (char *) 0, 0 }
};

You could add a new command by patching Mawk's C code.

nwk
  • 4,004
  • 1
  • 21
  • 22
0

I created a shell wrapper script called cppawk which combines the C preprocessor (from GCC) with Awk.

BSD licensed, it comes with a man page, regression tests and simple install instructions.

Normally, the C preprocessor creates macros that look like functions; but using certain control flow tricks, which work in Awk also much as they do in C, we can pull off minor miracles of syntactic sugar:

function __warn(x)
{
   print x
   return 0
}

#define warn for (__w = 1; __w; __w = __warn(__x)) __x =

NR % 5 == 0 {
  warn "processed rows: "NR
}

Run:

$ cppawk -f warn.cwk 
a
b
c
d
e
processed rows: 5
f
g
h
i
j
processed rows: 10
k

Because the entire for trick is in a single line of code, we could use the __LINE__ symbol to make the hidden variables quasi-unique:

function __warn(x)
{
   print x
   return 0
}

#define xcat(a, b, c) a ## b ## c
#define cat(a, b, c) xcat(a, b, c)
#define uq(sym) cat(__, __LINE__, sym)
#define warn for (uq(w) = 1; uq(w); uq(w) = __warn(uq(x))) uq(x) =

NR % 5 == 0 {
  warn "processed rows: "NR
}

The expansion is:

$ cppawk --prepro-only -f warn.cwk 
# 1 "<stdin>"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "<stdin>"
function __warn(x)
{
   print x
   return 0
}
NR % 5 == 0 {
  for (__13w = 1; __13w; __13w = __warn(__13x)) __13x = "processed rows: "NR
}

The u() macro interpolated 13 into the variables because warn is called on line 13.

Hope you like it.

PS, maybe don't do this, but find some less hacky way of using cppawk.

You can use C99/GNUC variadic macros, for instance:

#define warn(...) print __VA_ARGS__ >> "/dev/stderr"

NR % 5 == 0 {
  warn("processed rows:", NR)
}

We made a humble print wrapper which redirects to standard error.It seems like nothing, yet you can't do that with an Awk function: not without making it a one-argument function and passing the value of an expression which catenates everything.

Kaz
  • 55,781
  • 9
  • 100
  • 149