-1

I thought it would be easy to define a string such as "1 2 3" and use it within AWK (GAWK) to extract the required fields, how wrong I have been.

I have tried creating AWK arrays, BASH arrays, splitting, string substitution etc, but could not find any method to use the resulting 'chunks' (ie the column/field numbers) in a print statement.

I believe Akshay Hegde has provided an excellent solution with the get_cols function, here

but it was over 8 years ago, and I am really struggling to work out 'how it works', namely, what this is doing; s = length(s) ? s OFS $(C[i]) : $(C[i])

I am unable to post a comment asking for clarification due to my lack of reputation (and it is an old post). Is someone able to explain how the solution works?

NB I don't think I need the sub as I using the following to cleanup (replace all non-numeric characters with a comma, ie seperator, and sort numerically) Columns=$(echo $Input_string | sed 's/[^0-9]\+/,/g') Columns=$(echo $Columns | xargs -n1 | sort -n | xargs)

(using this string, the awk would be Executed using awk -v cols=$Columns -f test.awk infile in the given solution)


Given the informative answer from @Ed Morton, with a nice worked example, I have attempted to remove the need for a function (and also an additional awk program file). The intention is to have this within a shell script, and I would rather it be self contained, but also, further investigation into 'how it works'.

Fields="1 2 3"
echo $Fields | awk -F "," '{n=split($0,Column," "); for(i=1;i<=n;i++) s = length(s) ? s OFS $(Column[i]) : $(Column[i])}END{print "s="s " arr1="Column[1]" arr2="Column[2]" arr3="Column[3]}'

The results have surprised me (taking note of my Comment to Ed)

s=1 2 3 arr1=1 arr2=2 arr3=3

The above clearly shows the split has worked into the array, but I thought s would include $ for each ternary operator concatenation, ie "$1 $2 $3"

Moreso, I was hoping to append the actual file to the above command, which I have found allows me to use echo $string | awk '{program}' file.name

NB it is a little insulting that my question has been marked as -1 indicating little research effort, as I have spent days trying to work this out.

Taking all the information above, I think s results in "1 2 3", but the print doesn't accept this in the same way as it does as it is called from a function, simply trying to 'print 1 2 3' in relation to the file, which seems to be how all my efforts have ended up. This really confuses me, as Ed's 'diagonal' example works from command line, indicating that concept of 'print s' is absolutely fine when used with a file name input. Can anyone suggest how this (example below) can work?

I don't know if using echo pipe and appending the file name is strictly allowed, but it appears to work (?!?!?!)

(failed result) echo $Fields | awk -F "," '{n=split($0,Column," "); for(i=1;i<=n;i++) s = length(s) ? s OFS $(Column[i]) : $(Column[i])}END{print s}' myfile.txt

This appears to go through myfile.txt and output all lines containing many comma separated values, ie the whole file (I haven't included the values, just for illustration only) ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Chrizk
  • 13
  • 3
  • Ask a self-contained, complete question with a [mcve] so we can help you. Don't use the code you have in the 2nd-last paragraph, it contains bugs/anti-patterns. – Ed Morton Jan 10 '23 at 14:22
  • Fair enough, although I don't understand why it doesn't work (I guess the mentioned 'anti-patterns'), I accept that I am trying something that cannot be achieved in a single line. I have transposed the awk program in the linked article into a function in my bash script, and marked Ed's response as the answer. Thank you to everyone who has contributed. – Chrizk Jan 11 '23 at 13:32
  • Feel free to post a new question if you have a followup. I didn't post an answer to your question here btw, that was @Daweo, I was waiting for a [mcve] to get a better understanding of what you were trying to do before trying to answer. – Ed Morton Jan 11 '23 at 14:21

1 Answers1

1

what this is doing; s = length(s) ? s OFS $(C[i]) : $(C[i])

You have encountered a ternary operator, it has following syntax

condition ? valueiftrue : valueiffalse

length function, when provided with single argument does return number of characters, in GNU AWK integer 0 is considered false, others integers are considered true, so in this case it is is not empty check. When s is not empty (it might be also not initalized yet, as GNU AWK will assume empty string in such case), it is concatenated with output field separator (OFS, default is space) and C[i]-th field value and assigned to variable s, when s is empty value of C[i]-th field value. Used multiple time this allows building of string of values sheared by OFS, consider following simple example, let say you want to get diagonal of 2D matrix, stored in file.txt with following content

1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25

then you might do

awk '{s = length(s) ? s OFS $(NR) : $(NR)}END{print s}' file.txt

which will get output

1 7 13 19 25

Explanation: NR is number row, so 1st row $(NR) is 1st field, for 2nd row it is 2nd field, for 3rd it is 3rd field and so on

(tested in GNU Awk 5.0.1)

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
Daweo
  • 31,313
  • 3
  • 12
  • 25
  • Fabulous, and very quick answer. However, it has led me to question how the 'final' print _get_cols() works ... is it correct that the $ in $(C[i]) is integral in printing the column/field (eg print $1)? Furthermore, is the point of this 'check and replacement' to ensure the loop results in a single line of output per line of input (eg file)? I need to get my head around your diagonal example ;) – Chrizk Jan 10 '23 at 13:03
  • Perhaps that is more easily asked as 'what does the function return?' I am getting the impression it returns OFS$cols1OFS$cols2OFS$cols3 ...etc, resulting in print $cols1 $cols2 $cols3? NB I attempted to create a string='$1 $2 $3', but my expansion failed, echo $(awk -F "," '{print $string}' my.file). It seems this is close to the result of the function, but perhaps this is the difference between expressions and statements (as mentioned in 'Ternary operator and print' question on https://unix.stackexchange.com/) – Chrizk Jan 10 '23 at 13:46
  • `s = length(s) ? s OFS $(NR) : $(NR)` would be more concise and efficient written as `s = (s=="" ? "" : s OFS) $NR`. – Ed Morton Jan 10 '23 at 14:26