1

GIs there a way to make awk interactive when it is processing /dev/stdin via a pipe.

Imagine I have a program which continuously generates data. Example :

$ od -vAn -tu2 -w2 < /dev/urandom
 2357
60431
19223
...    

This data is being processed by a very advanced awk script by means of a pipe :

$ od -vAn -tu2 -w2 < /dev/urandom | awk '{print}'

Question: Is it possible to make this awk program interactive such that :

  • The program continuously prints output
  • When a single key is pressed (eg. z), it starts to output only 0 for each line it reads from the pipe.
  • When the key is pressed again, it continues to output the original data, obviously skipping the already processed records it printed as 0.

Problems:

  • /dev/stdin (also referenced as -) is already in use, so the keyboard interaction needs to be picked up with /dev/tty or is there another way?

  • getline key < "/dev/tty" awaits until RS is encountered, so in the default case you need to press two keys (z and Enter) :

    $ awk 'BEGIN{ getline key < "/dev/tty"; print key}'
    

    This is acceptable, but I would prefer a single key-press.

    So, is it possible to set RS locally such that getline reads a single character? This way we could locally modify RS and reset it after the getline. Another way might be using the shell function read. But it is incompatible between bash and zsh.

  • getline awaits for input until the end-of-time. So it essentially stops the processing of the pipe. There is a gawk extention which allows you to set a timeout, but this is only available since gawk 4.2. So I believe this could potentially work :

    awk '{print p ? 0 : $0 }
         { PROCINFO["/dev/tty", "READ_TIMEOUT"]=1;
           while (getline key < "/dev/tty") p=key=="z"?!p:p
         }
    

    However, I do not have access to gawk 4.2 (update: this does not work)

Requests:

  1. I would prefer a full POSIX compliant version, which is or entirely awk or using POSIX compliant system calls
  2. If this is not possible, gawk extensions prior to 3.1.7 can be used and shell independent system calls.
  3. As a last resort, I would accept any shell-awk construct which would make this possible under the single condition that the data is only read continuously by awk (so I'm thinking multiple pipes here).
kvantour
  • 25,269
  • 4
  • 47
  • 72
  • Why do you wanna use `awk` for that? – hek2mgl Jun 15 '18 at 14:11
  • @hek2mgl As you might imagine, this is an incredibly dumbed down version of some data-monitoring-system. The idea would be to use the key-press to change some processing options on the fly without interrupting data steam and restarting the monitoring system. Why `awk`? Why not! – kvantour Jun 15 '18 at 14:21
  • 1
    you can run an external script in your while loop to replace the getline stuff with the unrdelying bash, – LMC Jun 15 '18 at 15:01
  • 1
    It is going to be at minimum extremely tricky, and verges on the 'not possible using awk'. Your timeout solution using Gawk 4.2 would probably rate-limit the normal flow to one output number per timeout cycle. I've not checked the timing units of the timeout, but you didn't mention milliseconds (or microseconds or nanoseconds) so it is probably 'integer seconds'. In a C program, you'd arrange to monitor both `/dev/stdin` and `/dev/tty` for 'characters available', and print the standard input (or zeros) as it's available, but on those occasions when `/dev/tty` has data available, you'd read it. – Jonathan Leffler Jun 17 '18 at 22:21
  • 1
    You'll also have to consider whether you want raw I/O on the `/dev/tty` input so that single keystrokes are detectable. Putting the terminal into raw means all programs using it (I trust it's only this program) will have raw I/O. No signals; no interrupts from the keyboard. Be cautious. (The C program would probably use [`select()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html) or [`poll()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html) — or on Linux perhaps `epoll()`.) – Jonathan Leffler Jun 17 '18 at 22:23
  • @JonathanLeffler thanks a lot. The `PROCINFO` sets the time in milliseconds, but you are absolutely right about rate-limitting the normal flow. Also interesting to read about the polling. I have currently constructed an alternative approach (see answers), but with poling I might be able to write a [`gawk` extension](https://www.gnu.org/software/gawk/manual/html_node/Dynamic-Extensions.html) that allows this. – kvantour Jun 22 '18 at 13:32
  • @LuisMuñoz I agree with you but this would make the tool Bash dependent (zsh has a different read). If this could be shell independent this would be great. Currently I wrote something as a full shell script which would solve this problem and this would also be a way out for your suggestion (shell script vs `awk -f`). But as mentioned by Jonathan, it still would limit the rate limit ... although. I will look into a viable solution with this approach also. – kvantour Jun 22 '18 at 13:39
  • A shell agnostic script, that's a though :). – LMC Jun 22 '18 at 13:49
  • @EdMorton, do you have any insight in this matter? – kvantour Jun 28 '18 at 17:32
  • 1
    @LuisMuñoz An alternative to `read` could be this nifty trick : `$ stty cbreak -echo; KEY=$(dd if=/dev/tty bs=1 count=1 2>/dev/null); stty -cbreak echo` (source [here](https://www.commandlinefu.com/commands/view/2391/read-a-keypress-without-echoing-it)) – kvantour Jun 28 '18 at 17:47
  • 1
    Sorry, I just saw this. I don't think there's any good way to do it within awk. With gawk you could probably do something like set `PROCINFO["-", "READ_TIMEOUT"]` to a small value and then getline from stdin after processing each real line of input but I suspect that's not as easy to do as it was to say and the possible granularity of the timeout may be too high to really be useful in this case. – Ed Morton Apr 08 '19 at 19:25
  • 1
    @EdMorton Thanks for having a look. I've tried this method already but not with `-` as an input. I'll have a look at it. Aslo, great news for the synonyms. Finally only one single awk to look at ;-) – kvantour Apr 09 '19 at 07:27
  • 1
    Sorry, I didn't notice in your question you'd already tried that READ_TIMEOUT approach. I tried playing with it myself for a few mins and couldn't get it to behave as I expected either. I considered asking for help on usenet at comp.unix.shell or comp.lang.awk where all the shell and awk gurus and GNU awk providers hang out but didn't care enough to pursue it. – Ed Morton Apr 09 '19 at 13:40

1 Answers1

1

After some searching, I came up with a Bash script that allows doing this. The idea is to inject a unique identifiable string into the pipe that awk is processing. Both the original program od and the bash script write to the pipe. In order not to mangle that data, I used stdbuf to run the program od line-buffered. Furthermore, since it is the bash-script that handles the key-press, both the original program and the awk script have to run in the background. Therefore a clean exit strategy needs to be in place. Awk will exit when the key q is pressed, while od will terminate automatically when awk is terminated.

In the end, it looks like this :

#!/usr/bin/env bash

# make a fifo which we use to inject the output of data-stream
# and the key press
mkfifo foo

# start the program in line-buffer mode, writing to FIFO
# and run it in the background
stdbuf -o L  od -vAn -tu2 -w2 < /dev/urandom > foo &

# run the awk program that processes the identified key-press
# also run it in the background and insert a clear EXIT strategy
awk '/key/{if ($2=="q") exit; else p=!p}
     !p{print}
      p{print 0}' foo &

# handle the key pressing
# if a key is pressed inject the string "key <key>" into the FIFO
# use "q" to exit
while true; do
    read -rsn1 key
    echo "key $key" > foo 
    [[ $key == "q" ]] && exit
done

note: I ignored the concept that the key has to be z

Some useful posts :

kvantour
  • 25,269
  • 4
  • 47
  • 72