Is there a way to completely delete fields in awk, so that extra delimiters do not print?

Question

Consider the following command:

$ gawk -F"\t" "BEGIN{OFS=\"\t\"}{$2=$3=\"\"; print $0}" Input.tsv

When I set $2 = $3 = "", the intended effect is to get the same effect as writing:

print $1,$4,$5...$NF

However, what actually happens is that I get two empty fields, with the extra field delimiters still printing.

Is it possible to actually delete $2 and $3?

Note: If this was on Linux in bash, the correct statement above would be the following, but Windows does not handle single quotes well in cmd.exe.

$ gawk -F'\t' 'BEGIN{OFS="\t"}{$2=$3=""; print $0}' Input.tsv

You should use single quotes for the outer set, then you don't have to escape the double quotes within the script. If you're using double quotes for the outer set so you can embed shell variables, use `-v` to do variable passing instead. — Dennis Williamson, May 21 '12 at 23:33
I'm using awk in Windows. Cmd.exe doesn't play well with single quotes for some reason. — merlin2011, May 21 '12 at 23:36
I did this 10+ years ago, (I think). try doing a `$2=$3="";$0=$0`. Good luck. — shellter, May 22 '12 at 00:43
@shelter, Tried, no luck. Probably the version of awk has changed. Thanks for suggestion though! — merlin2011, May 22 '12 at 00:48
OK, now just thinking out-side-the-box ;->, `$2=$3="XYZ"; sub("\tXYZ\t", "", $0); $0=$0; print`. Not sure if you'd need both `\t` in the sub. Also, if you have the orginal awk book, check there, I thought that is where I learned $0=$0. Maybe I'm forgetting a step. Good luck. — shellter, May 22 '12 at 01:06
This question was run on Windows when I asked this question, so it needs to be `""`. — merlin2011, Apr 17 '14 at 21:45

score 8 · Accepted Answer · edited Aug 25 '20 at 17:11

8

This is an oldie but goodie.

As Jonathan points out, you can't delete fields in the middle, but you can replace their contents with the contents of other fields. And you can make a reusable function to handle the deletion for you.

$ cat test.awk
function rmcol(col,     i) {
  for (i=col; i<NF; i++) {
    $i = $(i+1)
  }
  NF--
}

{
  rmcol(3)
}

1

$ printf 'one two three four\ntest red green blue\n' | awk -f test.awk
one two four
test red blue

edited Aug 25 '20 at 17:11

ib.

27,830
11
80
100

answered Jul 11 '16 at 15:58

ghoti

45,319
8
65
104

3

Decrementing NF is undefined behavior per POSIX. It will delete the last field in some awks, be ignored in other awks, and could do anything else and still be POSIX compliant. – Ed Morton Sep 08 '18 at 13:34
@EdMorton: I appreciate the issue, it's not specified by the standard and should be avoided in portable scripts. I tested this with gawk, mawk, nawk and busybox awk which all behave as expected, do you know any awk's that do not support this behavior? – Thor May 08 '19 at 06:02
@Thor MacOS/BSD awk for one. – Ed Morton May 08 '19 at 13:41
@EdMorton, actually I created this answer on FreeBSD, and this function works on macOS's awk as well (which came over from FreeBSD a few years ago). I suspect where you might see different behaviour would be in older SysV systems. Alas, I have none of those to test with at the moment. – ghoti May 08 '19 at 17:17
I'm on MacOS right now. `awk --version` outputs `awk version 20070501` and `echo 'a b c' | awk '{NF--}1'` outputs `a b c` while `echo 'a b c' | gawk '{NF--}1'` outputs `a b`. Your script does produce the output you say it does though - not sure why off the top of my head but it can't be the `NF--` doing it. – Ed Morton May 08 '19 at 17:19
Ah, the `NF--` DOES have an impact but only if you do something to modify a field as you're doing and as this does: `echo 'a b c' | awk '{NF--;$1=$1}1'` outputs `a b` – Ed Morton May 08 '19 at 17:26
1

Ah I see what you mean, that's because `$0` is already set by the time your first action is evaluated. You'll see different results with `echo 'a b c' | awk '{NF--;$1=$1}1'`. Note that `NF` is still being decremented, the difference is when `$0` is constructed. I see that the behaviour is also different if the awk script is loaded from a file (`-f`) rather than from the command line. Whee, quirks! Gotta love 'em. – ghoti May 08 '19 at 17:26
2

`echo 'a b c' | awk '{$1=$1;NF--}1'` produces the same output `a b`. **undefined** behavior indeed - inexplicable might be a better term! :-). – Ed Morton May 08 '19 at 17:27

score 7 · Answer 2 · answered Jun 26 '12 at 23:33

You can't delete fields in the middle, but you can delete fields at the end, by decrementing NF.

So you can shift all the later fields down to overwrite $2 and $3 then decrement NF by two, which erases the last two fields:

$ echo 1 2 3 4 5 6 7 | awk '{for(i=2; i<NF-1; ++i) $i=$(i+2); NF-=2; print $0}'
1 4 5 6 7

score 6 · Answer 3 · edited Aug 25 '20 at 17:08

6

If you are just looking to remove columns, you can use cut:

$ cut -f 1,4- file.txt

To emulate cut:

$ awk -F "\t" '{ for (i=1; i<=NF; i++) if (i != 2 && i != 3) { if (i == NF) printf $i"\n"; else printf $i"\t" } }' file.txt

Similarly:

$ awk -F "\t" '{ delim =""; for (i=1; i<=NF; i++) if (i != 2 && i != 3) { printf delim $i; delim = "\t"; } printf "\n" }' file.txt

HTH

edited Aug 25 '20 at 17:08

ib.

27,830
11
80
100

answered May 21 '12 at 23:06

Steve

51,466
13
89
103

The last example prints a trailing tab. `{for (...) {printf delim $i; delim = "\t"}; printf "\n"}` – Dennis Williamson May 21 '12 at 23:29
I'm concerned about the gsub because there are other fields that are legitimately empty and I DO want the multiple delimiters. – merlin2011 May 21 '12 at 23:30
@merlin2011 See my changes. HTH. – Steve May 22 '12 at 00:34

score 1 · Answer 4 · edited Aug 25 '20 at 17:09

1

One way could be to remove fields like you do and remove extra spaces with gsub:

$ awk 'BEGIN { FS = "\t" } { $2 = $3 = ""; gsub( /\s+/, "\t" ); print }' input-file

edited Aug 25 '20 at 17:09

ib.

27,830
11
80
100

answered Jun 27 '12 at 20:48

Birei

35,723
2
77
82

score 1 · Answer 5 · edited Aug 25 '20 at 17:10

1

In the addition of the answer by Suicidal Steve I'd like to suggest one more solution but using sed instead awk.

It seems more complicated than usage of cut as it was suggested by Steve. But it was the better solution because sed -i allows editing in-place.

$ sed -i 's/\(.*,\).*,.*,\(.*\)/\1\2/' FILENAME

edited Aug 25 '20 at 17:10

ib.

27,830
11
80
100

answered Sep 05 '13 at 22:46

jsxt

1,097
11
28

score 1 · Answer 6 · answered Apr 18 '14 at 02:46

1

The only way I can think to do it in Awk without using a loop is to use gsub on $0 to combine adjacent FS:

$ echo {1..10} | awk '{$2=$3=""; gsub(FS"+",FS); print}'
1 4 5 6 7 8 9 10

answered Apr 18 '14 at 02:46

technomage · Answer 7 · 2021-08-31T12:49:46.713

1

To remove fields 2 and 3 from a given input file (assuming a tab field separator), you can remove the fields from $0 using gensub and regenerate it as follows:

awk -F '\t' 'BEGIN{OFS="\t"}\
             {$0=gensub(/[^\t]*\t/,"",3);\
              $0=gensub(/[^\t]*\t/,"",2);\
              print}' Input.tsv

edited Aug 31 '21 at 12:49

answered Aug 26 '21 at 17:19

technomage

9,861
2
26
40

score 0 · Answer 8 · edited Aug 25 '20 at 17:11

0

Well, if the goal is to remove the extra delimiters, then you can use tr on Linux. Example:

$ echo "1,2,,,5" | tr -s ','

1,2,5

edited Aug 25 '20 at 17:11

ib.

27,830
11
80
100

answered Jan 13 '17 at 19:16

Estorm

1

score 0 · Answer 9 · answered Feb 27 '20 at 14:53

The method presented in the answer of ghoti has some problems:

every assignment of $i = $(i+1) forces awk to rebuild the record $0. This implies that if you have 100 fields and you want to delete field 10, you rebuild the record 90 times.
changing the value of NF manually is not posix compliant and leads to undefined behaviour (as is mentioned in the comments).

A somewhat more cumbersome, but stable robust way to delete a set of columns would be:

a single column:

awk -v del=3 '
    BEGIN{FS=fs;OFS=ofs}
    { b=""; for(i=1;i<=NF;++i) if(i!=del) b=(b?b OFS:"") $i; $0=b }
    # do whatever you want to do
   ' file

multiple columns:

awk -v del=3,5,7 '
    BEGIN{FS=fs;OFS=ofs; del="," del ","}
    { b=""; for(i=1;i<=NF;++i) if (del !~ ","i",") b=(b?b OFS:"") $i; $0=b }
    # do whatever you want to do
   ' file

score -1 · Answer 10 · edited Dec 31 '16 at 07:11

-1

echo one two three four five six|awk '{
print $0
is3=$3
$3=""
print $0
print is3
}'

one two three four five six

one two four five six

three

edited Dec 31 '16 at 07:11

Vivek Mishra

5,669
9
46
84

answered Dec 31 '16 at 04:12

mraix

1

Is there a way to completely delete fields in awk, so that extra delimiters do not print?

10 Answers10

Linked