2

Trying to solve this question append wc... I cannot understand how to catch filename passed as argument to awk command.

awk 'BEGIN {for ( i=1;i<ARGC;i++ )print "ARGV " i ": [" ARGV[i] "]" }
     FNR==1 {print "FILENAME " ++a ": [" FILENAME "]" }
    ' $( ls )

work fine for standard file name like file1.txt but problem arise with spaced file name lile file with space (in fact certainly when file name contains $IFS character and IFS is not to be touch). FILENAME is OK, ARGV separate on space (quoted or not) like if parsed all parameter as one string after shell pass it.

I use this to count file lines even if file is empty (so never reach the FNR == 1 ) but it's not the question here.

So

  1. how should i format spaced character (i try to surround via sed with quote like $( ls | sed "s/'/'\"'\"'/g;s/.*/'&'/") but did'nt help)
  2. how to catch spaced value via ARGV

I use awk on linux and AIX (and not gawk in this case :-( )

some sample

#ls -1 file*
file
file and space
file'qu .txt
file"qu .txt

# awk '...' "file and space"
ARGV 1: [file and space]
FILENAME 1: [file and space]

# awk '...' $( ls file* | sed -e 's/ /?/g' )
ARGV 1: [file]
ARGV 2: [file and space]
ARGV 3: [file'qu .txt]
ARGV 4: [file"qu .txt]
FILENAME 1: [file]
FILENAME 2: [file and space]
FILENAME 3: [file'qu .txt]

last ls show that awk COULD make the difference ( file"qu .txt is an empty file so is FNR==1 never reach).

I see now that this is at shell passing info level, not awk.

Community
  • 1
  • 1
NeronLeVelu
  • 9,908
  • 1
  • 23
  • 43

2 Answers2

4

The problem is not related to awk, but to the shell (how you pass the filenames):

Unquoted command substitution $( ls ) will expand to a list of filenames, but the filenames are subject to word-splitting, so that filenames with embedded spaces are each broken into multiple arguments passed to awk.

This results in awk seeing either nonexistent filenames (at which point a fatal error occurs) or accidentally processing different files (multiple times); e.g., if files file one, file and one all exist in the current directory, awk will not process file one, and instead process both file and one twice.

A simple glob (*) will do in this case, whose expansion results are not subject to work-splitting, and is generally preferable to parsing ls output:

awk 'BEGIN {for ( i=1;i<ARGC;i++ )print "ARGV " i ": [" ARGV[i] "]" }
     FNR==1 {print "FILENAME " ++a ": [" FILENAME "]" }
    ' *

Using an unquoted command substitution to expand to multiple arguments passed to a command (command $(...)) is an anti-pattern in general, because the resulting output is subject not only to word-splitting, but also to globbing (filename expansion), as part of the so-called shell expansions.


Diagnosing the problem:

$ touch file 'file 1'
$ bash -s - $(ls file 'file 1') <<<'echo "$# args passed: [$1] [$2] [$3]"'
3 args passed: [file] [file] [1]

Note how, even though file 1 was passed with quotes, the target command (an ad-hoc bash script) sees 3 arguments, as a result of the shell having broken file 1 into separate arguments file and 1 (word-splitting), due to unquoted use of $(...) (command substitution).
(Note that "$(...)" wouldn't have helped, because the command output is then invariably passed as a single argument.)

The following simplified command causes awk to fail fundamentally, because instead of seeing single filename File One, it sees filenames File and One, neither of which exist:

$ rm -f File One; echo 'hi from File One' > 'File One'
$ awk '{ print FILENAME }' $(ls 'File One')
awk: fatal: cannot open file `File' for reading (No such file or directory)

The above is GNU awk's error message; BSD Awk and Mawk fundamentally behave the same, except for variations in the wording of the error message. All these implementations set the exit code to 2 in this scenario.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • if you want to print a space in between words, just say `print a, b` --> `print "FILENAME", ++a, ...` – fedorqui Jan 25 '17 at 14:05
  • understand but why FILENAME is thus correct in this case (with a spaced name). If it is purely shell expansion, FILENAME will received several *bad* file names ? – NeronLeVelu Jan 25 '17 at 14:06
  • @fedorqui problem is not the print, I use it to show the problem when inside the code. I use something like `Files[ ARGV[i]]++` in BEGIN section for catching empty files. this is where my problem appear – NeronLeVelu Jan 25 '17 at 14:09
  • @NeronLeVelu yes I know, it was just a side note since I did see a hardcoded space `print "a ", b` and I think `print "a", b` looks better. Nothing worth worrying about. – fedorqui Jan 25 '17 at 14:10
  • 1
    @NeronLeVelu It is simply impossible for awk or any other command to recognize an unquoted file name with spaces in it as a single argument. It sounds like you might have a question worth posting as a new question on the site. – Ed Morton Jan 25 '17 at 16:17
  • @EdMorton: I concur - a command would have to go out of its way to rebuild a single argument from individual arguments, which would seem misguided, and it's certainly not what `awk` should do - I'm mystified by the OP's claim. – mklement0 Jan 25 '17 at 16:20
  • @EdMorton and @mklement0 see last info in OP. awk DO the difference if file is passed as parameter directly (like `"spa ce"` or `spa*`) but not after a `$()` that return a single string that awk parse by space inside – NeronLeVelu Jan 26 '17 at 07:44
  • 2
    forget my last comment. I agree your info, i try just to find a way to recreate a shell interprestation after the $() occur to reseparate the argument at shell levell before awk catch them – NeronLeVelu Jan 26 '17 at 07:56
1

Would that work in your specific shell?

declare -a files=(*)
awk 'BEGIN {for ( i=1;i<ARGC;i++ )print "ARGV " i ": [" ARGV[i] "]" }
     FNR==1 {print "FILENAME " ++a ": [" FILENAME "]" }
    ' "${files[@]}"

The array expansion should too, sidestepping your issue, hopefully.

Fred
  • 6,590
  • 9
  • 20