0

A multi-line variable LOG_BUF is set in a bash script parse.sh. Then the variable is parsed with awk, printing all rows containing pat:

#!/bin/bash

LOG_BUF=$(cat <<-END
    pat TEST_a
    pat TEST_b
    TEST_c
    pat TEST_d
END
)
echo ${LOG_BUF} | awk 'BEGIN{}; /pat/{printf("%d %s", NR, $0); printf("\n")}; END{printf("\n")}'

The expected output is:

$ ./parse.sh
1 mem TEST_a
2 mem TEST_b
4 mem TEST_d

But instead, it prints:

$ ./parse.sh
1 mem TEST_a mem TEST_b TEST_c mem TEST_d

Seemingly, awk treats the whole string as a single record. How to get awk to parse the string as a multi-line string?

ysap
  • 7,723
  • 7
  • 59
  • 122
  • 1
    Checking with `$ echo ${LOG_BUF}` shows that there are no newlines in the `LOG_BUF` itself. – Pedram Mar 26 '23 at 09:18
  • @Pedram - that's right! So this is not an `awk` problem, but rather a string assignment problem. How do I add the newlines? – ysap Mar 26 '23 at 09:28
  • 3
    You must quote the variable expansion `${LOG_BUF}`. Replace `echo ${LOG_BUF}` with `echo "${LOG_BUF}"` – M. Nejat Aydin Mar 26 '23 at 09:28
  • @M.NejatAydin - yes! Please make this an answer so I can accept. – ysap Mar 26 '23 at 09:29
  • 1
    @ysap as M. Nejat Aydin has commented, you should enclose the variable with double quotes. Meanwhile I was checking other SO answers about assigning heredocs to variables and just found out about his comment when I submitted the answer :) – Pedram Mar 26 '23 at 09:37
  • 2
    As the [bash](https://stackoverflow.com/questions/tagged/bash) tag you used instructs - "For shell scripts with syntax or other errors, please check them at https://shellcheck.net before posting them here.". If you had done that, shellcheck would have told you what the problem is in your script and told you the fix for it. – Ed Morton Mar 26 '23 at 12:37
  • You also don't need to do `$(cat <<-END...`, just `LOG_BUF='pat TEST_a ... pat TEST_d'`. And don't use an all upper case variable name for this, see [correct-bash-and-shell-script-variable-capitalization](https://stackoverflow.com/questions/673055/correct-bash-and-shell-script-variable-capitalization) – Ed Morton Mar 26 '23 at 12:42
  • 1
    @EdMorton-SOstopbullying - thanks for the pointer for the shellcheck. Yes, I missed that in the tag info. I checked it now and it did point out the error. – ysap Mar 26 '23 at 15:06
  • @EdMorton-SOstopbullying - now, reading the suggested post on variable case, I am not convinced. There actually seem to be more arguments against that then for that. Especially interesting is the ones from Brian Wilson. Thanks anyway for the pointer. – ysap Mar 26 '23 at 15:11
  • 1
    I'm sorry but whoever Brian Wilson is they are totally wrong if they think it's a good idea to use all upper case variable names for non environment variables. I cannot stress enough how wrong they are - if you continue to do this it WILL bite you in the ass some day (I can't count how many times in the past 40+ years people have asked for help with their script and it's been them overwriting HOME, PATH, USER, ENV, or some other environment variable) and at best makes your code hard to read as, to the rest of us, it looks like you're using environment variables. – Ed Morton Mar 26 '23 at 15:16
  • @EdMorton-SOstopbullying - OK, I will consider this in the future. Thanks for the insight. – ysap Mar 26 '23 at 15:18
  • fwiw, if this is the entire script then the `BEGIN{}` and `END{}` blocks aren't needed (unless you *do* want the `END{}` block to print an 'extra' blank line at the end), and the remaining 2x `printf` calls can be combined into a single equivalent call: `printf("%d %s\n", NR, $0)` – markp-fuso Mar 26 '23 at 18:32
  • @markp-fuso - thanks, this is know. The posted script is just a stripped down version of a more sophisticated one. – ysap Mar 27 '23 at 16:06

1 Answers1

4

Use double quotes to preserve the newlines

You must sandwich the LOG_BUF between double quotes to preserve the newlines, so change the last line to this:

$ echo "${LOG_BUF}" | awk '/pat/{printf("%d %s", NR, $0); print ""}; END{print ""}'
1 pat TEST_a
2     pat TEST_b
4     pat TEST_d

Using shellcheck

As Ed Morton - SO stop bullying pointed out in the comments, it is advised to run a shellcheck on shell scripts before posting questions on SO. Doing so would show the following result:

$ shellcheck script.sh

In script.sh line 10:
echo ${LOG_BUF} | awk 'BEGIN{}; /pat/{printf("%d %s", NR, $0); printf("\n")}; END{printf("\n")}'
     ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean: 
echo "${LOG_BUF}" | awk 'BEGIN{}; /pat/{printf("%d %s", NR, $0); printf("\n")}; END{printf("\n")}'

For more information:
  https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...

Using read instead of cat

On another note, you may want to use read instead of cat to assign a heredoc value to a variable in bash, the following code snippet is inspired from this SO answer, I encourage you to check it out for more details:

$ read -r -d '' LOG_BUF <<-'EOF'
    pat TEST_a
    pat TEST_b
    TEST_c
    pat TEST_d
EOF

$ echo "$LOG_BUF"
pat TEST_a
    pat TEST_b
    TEST_c
    pat TEST_d
Pedram
  • 921
  • 1
  • 10
  • 17
  • Thanks. What @M.NejatAydin suggested in the comments. Could you pleas TL;DR why `read` is better than `cat` for this? – ysap Mar 26 '23 at 09:36
  • Hey there @ysap :) Bash might introduce bugs since it parses escapes and other sequences inside `$( ... )` - take a look [here](https://unix.stackexchange.com/a/340729/359712) - and using `read`, you don't need to use `cat` in a subshell. But I know that using `cat` is a well-known heredoc idiom. So feel free to use the one that suits better :) – Pedram Mar 26 '23 at 09:53