1

How do I print output to a directory in awk using a shell argument or command parameter?

Shell program invokes and passes arguments to awk program:

testshell.sh

shelloutputdir="./ouputdir/"
./testawk inputfile.txt ./outputdir/

Awk program:

testawk

#!/usr/bin/awk -f
{
    print FILENAME > "./outputdir/outputfile1.txt"
    fn2="outputfile2.txt"
    fn3="outputfile3.txt"
    fn4="outputfile4.txt"
    print FILENAME > ARGV[2]"/"fn2
    print FILENAME > ARGV[2]"subdir/"fn3
    print FILENAME > $shelloutputdir"subdir/"fn4
}

Note:

inputfile.txt

is only an example, as the shell and awk programs will process other arguments.

The output directories already exist.

./outputdir/
./outputdir/subdir/

The outputs:

./outputdir/outputfile1.txt
./outputdir/outputfile2.txt
./outputdir/subdir/outputfile3.txt

outputfile4.txt is not created

The error:

awk: ./testawk:9: (FILENAME=inputfile.txt FNR=1) fatal: can't redirect to `input text filesubdir/outputfile4.txt' (No such file or directory)

Summary of questions:

  1. How do I explicitly set the output directory in awk?

  2. How do I use a command line parameter to set the output directory in awk?

  3. How do I create a directory if it does not exist in awk?

  4. How do I pass a shell variable to an awk variable to set the output directory?

Appreciate help and any example approaches

Gabe
  • 226
  • 3
  • 13
  • So to summarize, your questions are "How do I get command line parameters in an awk script?" and "How do I create a directory from within an `awk` script?" – that other guy Jun 16 '17 at 00:11
  • Fairly close, "how do I get command line parameters in an awk script?" and both cases for the second question "output to an existing directory, by explicitly stating in the path in the awk script, and by the command line parameter", it would be good to have an example for creating the directory if it does not exist as well. – Gabe Jun 16 '17 at 00:21
  • [Here's](https://stackoverflow.com/a/15970886/1899640) a duplicate for getting command line arguments. You can already do `print "foo" > "dir/subdir/file"` to write to a file in subdirs that exist – that other guy Jun 16 '17 at 00:31
  • You need change `ARG` to `ARGV` – komar Jun 16 '17 at 00:57
  • 1
    And better make like that `ARGV[2]"/"fn`. If will be duplicate `//` it's not a problem. – komar Jun 16 '17 at 01:00
  • Oh really, I saw that and thought if it was passed as a parameter `./outputdir/` and specified in awk `print FILENAME > ARGV[2]"/"fn` the resulting `//` would be a problem – Gabe Jun 16 '17 at 01:12
  • 1
    No, it's no a problem. Multilply `/` in path will be ignored by system. – komar Jun 16 '17 at 01:15
  • Thank you everyone, they are all now creating in the right locations, but still receiving the warning `awk: ./testawk:7: warning: command line argument ./outputdir/ is a directory: skipped` – Gabe Jun 16 '17 at 01:21

2 Answers2

3

Using a shebang to execute the awk script just makes your life harder, don't do it. If you get rid of the shebang and write "testawk" as:

odir="$1"
shift
/usr/bin/awk -v odir="$odir" '
{
    print FILENAME > (odir "outputfile1.txt")
    fn2="outputfile2.txt"
    fn3="outputfile3.txt"
    fn4="outputfile4.txt"
    print FILENAME > (odir fn2)
    print FILENAME > (odir "subdir/" fn3)
    print FILENAME > (odir "subdir/" fn4)
}
' "$@"

then you can call it as:

shelloutputdir="./outputdir/"
./testawk "$shelloutputdir" inputfile.txt

or do whatever else you like. The point is that not using the shebang lets you separate awk from shell args and awk file names from awk variable initial values.

You can create a directory whose name is stored the variable foo with

system("mkdir -p \047" foo "\047")
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thank you, this looks elegant. Implemented as above, but receiving an error "`awk: cmd. line:3: (FILENAME=inputfile.txt FNR=1) fatal: can't redirect to odir=./ouputdir/outputfile1.txt (No such file or directory)`" – Gabe Jun 16 '17 at 06:10
  • Just for clarification, the `"$@"` all positional parameters except `$0`, how is that treated by awk (and or bash) ? Also, if I want to pass another arg to awk such as `inputdtd="inputfile.dtd"` calling as `./testawk "$shelloutputdir" "$inputdtd" input.txt` , what would the corresponding awk look like? `odir="$1" ; idtd="$2" ; shift 2; /usr/bin/awk -v odir="$odir" idtd="$idtd"` or something similar ? – Gabe Jun 16 '17 at 06:42
  • 1
    @Gabe sounds like you used `tstawk odir=./outputdir/` instead of `tstawk ./outputdir/`. `"$@"` is expanded by the shell to `"$1" "$2"` etc. Yes, that's exactly what you'd write. – Ed Morton Jun 16 '17 at 12:42
  • yes it was `./testawk odir=./outputdir/`, silly mistake, and had a couple of others which debugged using echos. Thanks again, delving in to implement and test. – Gabe Jun 16 '17 at 19:40
  • 1
    minor note in the example above `shelloutputdir="./ouputdir/" `in my case is `="./outputdir/" ` can see the advantage of using `ouputdir` over `outputdir` – Gabe Jun 16 '17 at 19:49
  • 2
    Nicely done; worth noting that `system("mkdir -p \047" foo "\047")` would break if `foo` contained a single quote and that a generic quoting-for-the-shell mechanism requires more work. – mklement0 Jun 16 '17 at 21:25
  • 1
    Thank you every one that has helped with this. I have now implemented this framework into my [xml extraction and validation pipeline]( https://stackoverflow.com/questions/44388628/awk-pipeline-to-extract-and-validate-xml-files) and very happy with the relative simplicity of extending this code base to scale for processing large data sets. Next is a performance measurement module. – Gabe Jun 17 '17 at 07:19
1

Note:
* This answer addresses the question as asked, based on a stand-alone awk script that uses a shebang line (#!/usr/bin/awk -f).
* Ed Morton's helpful answer shows how to call awk from a shell script as an alternative, which has its advantages.

All operands passed to awk that come after the script operand (which is implicitly the stand-alone script itself, in this case) are by default interpreted as input files.

Given that ./outputdir/ is by definition a directory, it can't act as an input file, which is why you're getting the warning.

However, Awk offers pseudo-filename-operand syntax <var>=<value>, which, instead of passing a filename, defines an Awk variable, analogous to the the pre-script -v <var>=<value> option syntax (and given that your invocation is by shebang line, the -v-option-based variable assignment is not an option).

Note that these assignments happen as they're being encountered in the list of post-script operands, so you need to place them before actual input files whose processing relies on them:

shelloutputdir="./outputdir/"
./testawk odir="$shelloutputdir" inputfile.txt # Note the definition of variable `odir`

There is no limit on the number of variables you can define this way, but, at least hypothetically, you're limited by the maximum overall length of the command line, which is value close to, but less than what getconf ARG_MAX reports.

The above defines Awk variable odir, so your script needs to reference that:

#!/usr/bin/awk -f
{
    fn3="outputfile3.txt"
    print FILENAME > (odir "subdir/" fn3)
}

As Ed Morton points out, if the output filename is calculated from an expression, that expression should be enclosed in (...) for robustness; while it may also work without the parentheses in some Awk implementations (e.g., GNU Awk and Mawk), it will break in others (e.g., BSD/macOS Awk).
The Awk POSIX spec does not regulate the behavior in this situation.


  1. How do I explicitly set the output directory in awk?

There is no Awk-internal mechanism, but you can use the shell to cd to the output directory beforehand.

  1. How do I use a command line parameter to set the output directory in awk?

See solution above. There is no special output-directory parameter in Awk, but you can pass the output-directory path as an Awk variable.

  1. How do I create a directory if it does not exist in awk?

There is no Awk-internal mechanism, but - if creating the dir. ahead of time in the shell is not an option - you can use the system() function to invoke mkdir; e.g.:

# If the dir. name never contains ' (single quotes):
awk -v odir="out-dir" 'BEGIN { system("mkdir \047" odir "\047") }'

# *From inside your stand-alone Awk script only*, you don't need \047 to represent
# ' chars - see below.
system("mkdir '" odir "'")

# Otherwise, more work is needed:
awk -v odir="out'dir" '
   function shellQuote(s) { gsub("\047", "\047\\\047\047", s); return "\047" s "\047" }
   BEGIN { system("mkdir " shellQuote(odir)) }
'

\047 is an octal escape sequence representing ', which must be used when calling awk explicitly, from the shell, because '...' is already being used to enclose the script as a whole, which prevent use of embedded ' chars. altogether, because single-quoted shell strings do not support it.

This is one aspect in which a stand-alone awk script has an advantage over explicit awk invocation from the shell: you're free to use literal ' instances in the stand-alone script - no need for \047.

  1. How do I pass a shell variable to an awk variable to set the output directory?

See the answer to question #2.

mklement0
  • 382,024
  • 64
  • 607
  • 775