1

Can someone please explain to me how I can combine these piped awks to a single awk?

awk 'match($0, /(,|^)[^,]*shalvar[^,]*(,|$)/) {
  print substr($0, RSTART, RLENGTH)}' file.txt |
awk 'gsub(",","")' | awk '{$1=$1};1'

I try this but it doesn't work:

awk 'match($0, /(,|^)[^,]*shalvar[^,]*(,|$)/) {
  gsub(",","");$1=$1;print substr($0, RSTART, RLENGTH)}' file.txt

I understand that it shouldn't work because the characters are removed but the pointers don't change. How can I fix it now?

tgwtdt
  • 362
  • 2
  • 15
  • `awk 'match(...) { ... } { gsub(",",""); $1=$1; print }' file.txt`? – PesaThe Jan 17 '18 at 18:10
  • I tried this but it didn't work: `awk 'match($0, /(,|^)[^,]*shalvar[^,]*(,|$)/) {print substr($0, RSTART, RLENGTH)} { gsub(",",""); $1=$1; print }' file.txt` – tgwtdt Jan 19 '18 at 07:32
  • I rolled back your latest edit -- please don't heap on additional requirements after you have received answers to what you originally asked. – tripleee Jan 19 '18 at 11:17
  • @tripleee I didn't add it as a requirement. It was just a favor:D but you have already provided an answer for that so, why not keep it there? – tgwtdt Jan 19 '18 at 20:07

1 Answers1

2

You need to wrap things the other way around. Collect the string you want to extract, then do the manipulations on the extracted value, just like your original script with multiple Awk scripts in a pipeline did.

awk 'match($0, /(,|^)[^,]*shalvar[^,]*(,|$)/) {
  g=substr($0, RSTART, RLENGTH);
  gsub(",","",g);
  # $1=$1 is nice but we cannot use that here; here is a workaround
  gsub(/^ *| *$/, "", g);
  print g}' file.txt

The shortcut $1=$1 for trimming whitespace around a value works in an isolated Awk script if you are confident that there is only one field, but here, we don't necessarily have a single field (or do we?) so I use a more general solution to explicitly trim whitespace around the extracted string which also avoids relying on a well-known but still obscure side effect.

If shalvar is actually a variable you want to receive from the shell like $foo , try

awk -v field="$foo" 'match($0, "(^|,)[^,]*" field "[^,]*(,|$)") {
    ...

to interpolate the variable into a string which is then applied as a regular expression.

tgwtdt
  • 362
  • 2
  • 15
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Can you please explain how your workaround for `$1=$1` works and why `$1=$1` can't be used in here? Also would you be so kind to teach me how I can use a shell variable instead of `shalvar`? – tgwtdt Jan 19 '18 at 11:04
  • Answer updated. I removed the remark about removing `match()` because indeed, I *was* missing something obvious (-: – tripleee Jan 19 '18 at 11:12
  • Unless the `field` variable contains a comma, I suppose we could trim commas *and* adjacent spaces from around the value in one go, actually. Without seeing your data or a statement of what the script is supposed to do, I won't speculate further. – tripleee Jan 19 '18 at 11:15
  • Many times, googling your precise problem statement will bring up an existing question on Stack Overflow with a good answer. Here's one: https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script – tripleee Jan 19 '18 at 11:16
  • no there is no comma in the field. Comma is the separators of different items on a text file. What I want is to get the items that have `shalvar` inside them. each line has several items that are separated by comma – tgwtdt Jan 19 '18 at 20:12
  • I had already seen the question you mentioned and several others but non of them worked for me because I was missing the quotations and non of the answers had this quotation and noone had explained them. – tgwtdt Jan 19 '18 at 20:14