19

Consider the following command:

$ gawk -F"\t" "BEGIN{OFS=\"\t\"}{$2=$3=\"\"; print $0}" Input.tsv

When I set $2 = $3 = "", the intended effect is to get the same effect as writing:

print $1,$4,$5...$NF

However, what actually happens is that I get two empty fields, with the extra field delimiters still printing.

Is it possible to actually delete $2 and $3?

Note: If this was on Linux in bash, the correct statement above would be the following, but Windows does not handle single quotes well in cmd.exe.

$ gawk -F'\t' 'BEGIN{OFS="\t"}{$2=$3=""; print $0}' Input.tsv
ib.
  • 27,830
  • 11
  • 80
  • 100
merlin2011
  • 71,677
  • 44
  • 195
  • 329
  • 2
    You should use single quotes for the outer set, then you don't have to escape the double quotes within the script. If you're using double quotes for the outer set so you can embed shell variables, use `-v` to do variable passing instead. – Dennis Williamson May 21 '12 at 23:33
  • I'm using awk in Windows. Cmd.exe doesn't play well with single quotes for some reason. – merlin2011 May 21 '12 at 23:36
  • I did this 10+ years ago, (I think). try doing a `$2=$3="";$0=$0`. Good luck. – shellter May 22 '12 at 00:43
  • @shelter, Tried, no luck. Probably the version of awk has changed. Thanks for suggestion though! – merlin2011 May 22 '12 at 00:48
  • OK, now just thinking out-side-the-box ;->, `$2=$3="XYZ"; sub("\tXYZ\t", "", $0); $0=$0; print`. Not sure if you'd need both `\t` in the sub. Also, if you have the orginal awk book, check there, I thought that is where I learned $0=$0. Maybe I'm forgetting a step. Good luck. – shellter May 22 '12 at 01:06
  • This question was run on Windows when I asked this question, so it needs to be `""`. – merlin2011 Apr 17 '14 at 21:45
  • Rolled back and updated to absorb the edit. – merlin2011 Apr 17 '14 at 21:55

10 Answers10

8

This is an oldie but goodie.

As Jonathan points out, you can't delete fields in the middle, but you can replace their contents with the contents of other fields. And you can make a reusable function to handle the deletion for you.

$ cat test.awk
function rmcol(col,     i) {
  for (i=col; i<NF; i++) {
    $i = $(i+1)
  }
  NF--
}

{
  rmcol(3)
}

1

$ printf 'one two three four\ntest red green blue\n' | awk -f test.awk
one two four
test red blue
ib.
  • 27,830
  • 11
  • 80
  • 100
ghoti
  • 45,319
  • 8
  • 65
  • 104
  • 3
    Decrementing NF is undefined behavior per POSIX. It will delete the last field in some awks, be ignored in other awks, and could do anything else and still be POSIX compliant. – Ed Morton Sep 08 '18 at 13:34
  • @EdMorton: I appreciate the issue, it's not specified by the standard and should be avoided in portable scripts. I tested this with gawk, mawk, nawk and busybox awk which all behave as expected, do you know any awk's that do not support this behavior? – Thor May 08 '19 at 06:02
  • @Thor MacOS/BSD awk for one. – Ed Morton May 08 '19 at 13:41
  • @EdMorton, actually I created this answer on FreeBSD, and this function works on macOS's awk as well (which came over from FreeBSD a few years ago). I suspect where you might see different behaviour would be in older SysV systems. Alas, I have none of those to test with at the moment. – ghoti May 08 '19 at 17:17
  • I'm on MacOS right now. `awk --version` outputs `awk version 20070501` and `echo 'a b c' | awk '{NF--}1'` outputs `a b c` while `echo 'a b c' | gawk '{NF--}1'` outputs `a b`. Your script does produce the output you say it does though - not sure why off the top of my head but it can't be the `NF--` doing it. – Ed Morton May 08 '19 at 17:19
  • Ah, the `NF--` DOES have an impact but only if you do something to modify a field as you're doing and as this does: `echo 'a b c' | awk '{NF--;$1=$1}1'` outputs `a b` – Ed Morton May 08 '19 at 17:26
  • 1
    Ah I see what you mean, that's because `$0` is already set by the time your first action is evaluated. You'll see different results with `echo 'a b c' | awk '{NF--;$1=$1}1'`. Note that `NF` is still being decremented, the difference is when `$0` is constructed. I see that the behaviour is also different if the awk script is loaded from a file (`-f`) rather than from the command line. Whee, quirks! Gotta love 'em. – ghoti May 08 '19 at 17:26
  • 2
    `echo 'a b c' | awk '{$1=$1;NF--}1'` produces the same output `a b`. **undefined** behavior indeed - inexplicable might be a better term! :-). – Ed Morton May 08 '19 at 17:27
7

You can't delete fields in the middle, but you can delete fields at the end, by decrementing NF.

So you can shift all the later fields down to overwrite $2 and $3 then decrement NF by two, which erases the last two fields:

$ echo 1 2 3 4 5 6 7 | awk '{for(i=2; i<NF-1; ++i) $i=$(i+2); NF-=2; print $0}'
1 4 5 6 7
Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
6

If you are just looking to remove columns, you can use cut:

$ cut -f 1,4- file.txt

To emulate cut:

$ awk -F "\t" '{ for (i=1; i<=NF; i++) if (i != 2 && i != 3) { if (i == NF) printf $i"\n"; else printf $i"\t" } }' file.txt

Similarly:

$ awk -F "\t" '{ delim =""; for (i=1; i<=NF; i++) if (i != 2 && i != 3) { printf delim $i; delim = "\t"; } printf "\n" }' file.txt

HTH

ib.
  • 27,830
  • 11
  • 80
  • 100
Steve
  • 51,466
  • 13
  • 89
  • 103
1

One way could be to remove fields like you do and remove extra spaces with gsub:

$ awk 'BEGIN { FS = "\t" } { $2 = $3 = ""; gsub( /\s+/, "\t" ); print }' input-file
ib.
  • 27,830
  • 11
  • 80
  • 100
Birei
  • 35,723
  • 2
  • 77
  • 82
1

In the addition of the answer by Suicidal Steve I'd like to suggest one more solution but using sed instead awk.

It seems more complicated than usage of cut as it was suggested by Steve. But it was the better solution because sed -i allows editing in-place.

$ sed -i 's/\(.*,\).*,.*,\(.*\)/\1\2/' FILENAME
ib.
  • 27,830
  • 11
  • 80
  • 100
jsxt
  • 1,097
  • 11
  • 28
1

The only way I can think to do it in Awk without using a loop is to use gsub on $0 to combine adjacent FS:

$ echo {1..10} | awk '{$2=$3=""; gsub(FS"+",FS); print}'
1 4 5 6 7 8 9 10
1

To remove fields 2 and 3 from a given input file (assuming a tab field separator), you can remove the fields from $0 using gensub and regenerate it as follows:

awk -F '\t' 'BEGIN{OFS="\t"}\
             {$0=gensub(/[^\t]*\t/,"",3);\
              $0=gensub(/[^\t]*\t/,"",2);\
              print}' Input.tsv
technomage
  • 9,861
  • 2
  • 26
  • 40
0

Well, if the goal is to remove the extra delimiters, then you can use tr on Linux. Example:

$ echo "1,2,,,5" | tr -s ','
1,2,5
ib.
  • 27,830
  • 11
  • 80
  • 100
Estorm
  • 1
0

The method presented in the answer of ghoti has some problems:

  • every assignment of $i = $(i+1) forces awk to rebuild the record $0. This implies that if you have 100 fields and you want to delete field 10, you rebuild the record 90 times.

  • changing the value of NF manually is not posix compliant and leads to undefined behaviour (as is mentioned in the comments).

A somewhat more cumbersome, but stable robust way to delete a set of columns would be:

a single column:

awk -v del=3 '
    BEGIN{FS=fs;OFS=ofs}
    { b=""; for(i=1;i<=NF;++i) if(i!=del) b=(b?b OFS:"") $i; $0=b }
    # do whatever you want to do
   ' file

multiple columns:

awk -v del=3,5,7 '
    BEGIN{FS=fs;OFS=ofs; del="," del ","}
    { b=""; for(i=1;i<=NF;++i) if (del !~ ","i",") b=(b?b OFS:"") $i; $0=b }
    # do whatever you want to do
   ' file
kvantour
  • 25,269
  • 4
  • 47
  • 72
-1
echo one two three four five six|awk '{
print $0
is3=$3
$3=""
print $0
print is3
}'

one two three four five six

one two four five six

three

Vivek Mishra
  • 5,669
  • 9
  • 46
  • 84
mraix
  • 1