7

I am trying to use awk to remove first three fields in a text file. Removing the first three fields is easy. But the rest of the line gets messed up by awk: the delimiters are changed from tab to space

Here is what I have tried:

head pivot.threeb.tsv | awk 'BEGIN {IFS="\t"} {$1=$2=$3=""; print }' 

The first three columns are properly removed. The Problem is the output ends up with the tabs between columns $4 $5 $6 etc converted to spaces.

Update: The other question for which this was marked as duplicate was created later than this one : look at the dates.

WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
  • 1
    There is no variable named "IFS" in awk. shell has IFS, awk has FS. – Ed Morton May 06 '13 at 14:30
  • Possible duplicate of [Using awk to print all columns from the nth to the last](https://stackoverflow.com/questions/2961635/using-awk-to-print-all-columns-from-the-nth-to-the-last) – Ciro Santilli OurBigBook.com Jan 12 '19 at 20:24
  • 1
    @CiroSantilli新疆改造中心六四事件法轮功 That one *came later* than my question: _look at the dates_ . This one already had a number of answers _and had been accepted_ before that one were even created. – WestCoastProjects Jan 12 '19 at 20:40
  • Hi, the current consensus is to close by "quality": http://meta.stackexchange.com/questions/147643/should-i-vote-to-close-a-duplicate-question-even-though-its-much-newer-and-ha Since "quality" is not measurable, I just go by upvotes. ;-) Likely it comes down to which question hit the best newbie Google keywords on the title. – Ciro Santilli OurBigBook.com Jan 12 '19 at 20:43
  • Put yourself in my shoes. I get a closed answer - when asking the question earlier. The person who created a somewhat-duplicate question has it kept. How would you think about this? – WestCoastProjects Jan 12 '19 at 20:48

4 Answers4

6

first as ED commented, you have to use FS as field separator in awk. tab becomes space in your output, because you didn't define OFS.

awk 'BEGIN{FS=OFS="\t"}{$1=$2=$3="";print}' file

this will remove the first 3 fields, and leave rest text "untouched"( you will see the leading 3 tabs). also in output the <tab> would be kept.

awk 'BEGIN{FS=OFS="\t"}{print $4,$5,$6}' file

will output without leading spaces/tabs. but If you have 500 columns you have to do it in a loop, or use sub function or consider other tools, cut, for example.

Kent
  • 189,393
  • 32
  • 233
  • 301
5

Actually this can be done in a very simple cut command like this:

cut -f4- inFile
anubhava
  • 761,203
  • 64
  • 569
  • 643
3

If you don't want the field separation altered then use sed to remove the first 3 columns instead:

sed -r 's/(\S+\s+){3}//' file

To store the changes back to the file you can use the -i option:

sed -ri 's/(\S+\s+){3}//' file
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
  • `\S` and `\s` are both PCRE-isms. `sed` isn't guaranteed to support them; the POSIX standard only guarantees BRE with a very small number of guaranteed extensions -- see https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html#tag_20_116_13_02 – Charles Duffy Apr 19 '21 at 12:45
  • ...the portable way to write `\s` is `[[:space:]]`, and the portable way to write `\S` is `[^[:space:]]` – Charles Duffy Apr 19 '21 at 12:47
0
awk '{for (i=4; i<NF; i++) printf $i " "; print $NF}'
Bobo
  • 8,777
  • 18
  • 66
  • 85
  • this one fails if the last column contains double space in the names – meso_2600 Apr 08 '15 at 12:25
  • This fails to produce expected output if there are less than four fields on any line. (It will print the last of them instead of removing all.) Instead you could use: `awk '{for (i=4; i<=NF; i++) printf $i " "; printf "\n"}'` Or add some additional logic to prevent the trailing space. – Wildcard Nov 05 '15 at 01:56