How do I remove the last 2 columns from a tsv file?

Question

I am coding using bash on terminal through a docker container on my mac. I am struggling to figure out how to remove the last 2 columns on my TSV file. It has 7 total and the last 2 are not needed for my work and are required to be removed.

Edit: The first picture is the original data file, the second is what the code is doing and it is deleting some random entries from the column. The third picture is what the end result of this program should do. The month and year columns I am struggling with also but I deleted the code and tried to simplify the data first.

I tried using awk and using NF = NF - 2 which does remove the last 2 columns but for some reason deletes some of the data I have in my 5th column which I need. So whilst I got the column deletion I needed, the code did a little extra. Here is the code:

preprocess() {
 31     input_file="$1"
 32 
 33     # Extract the base name of the input file
 34     base_name=$(basename "$input_file" .tsv)
 35 
 36     # Create the new output file name
 37     output_file="${base_name}_clean.tsv"
 38 
 39     awk -F'\t' 'BEGIN{OFS=FS} 
 40     {
 41         NF = NF - 2
 42 
 43         print
 44     }' "$input_file" > "$output_file"
 45 }

I Have a few other lines but they shouldn't cause any issues. They just check the file exists etc.

Please take a look at [How to create a Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve). — Cyrus, May 20 '23 at 16:07
`csvcut -t -c 1-5 input.tsv` (csvcut is part of the csvkit package) — Shawn, May 20 '23 at 16:15
please update the questino with the sample data file, the (wrong) output generated by your code and the (correct) expected output — markp-fuso, May 20 '23 at 17:02
There is [some discussion here](https://unix.stackexchange.com/questions/450603/what-can-be-done-with-nf-in-an-awk-code-block-or-function) on modification directly of `NF`. Maybe it's relevant for you ? — MyICQ, May 20 '23 at 17:22
Regarding "I tried using awk and using NF = NF - 2 which does remove the last 2 columns but for some reason deletes some of the data I have in my 5th column which I need." - `NF=NF-2` is undefined behavior so your awk could do anything, but chances are you have DOS line endings messing with what you see, see [why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it](https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it) — Ed Morton, May 21 '23 at 14:59
If you [edit] your question to provide concise, testable sample input and expected output then we can best help you. — Ed Morton, May 21 '23 at 15:00

score 2 · Answer 1 · answered May 20 '23 at 17:14

With AWK a script like this works:

Inputdata (separated by ; for clarity here, but could be tab also).

F1;F2;F3;F4
V11;V12;V13;V14
V21;V22;V23;V24

Program to convert. Comments to everyone can follow, even those new to awk.

BEGIN{
   FS=";"    # Convert til "\t" for TAB separation
   OFS=";"   # set as desired
   skipcolcount=2
}

{ 
  # In each line, loop over the fields
  for (i=1;i<=NF-skipcolcount;i++) {   
     printf $i                   # reference field by index variable  
     if (i<NF-skipcolcount) {    # no separator after last field
       printf OFS
     }
  }
  printf "\n"                    # linefeed after each line
}

Result:

F1;F2
V11;V12
V21;V22

This is more verbose than other solution modifiing `NF`, but allows more flexibility such as printing every other column, or skipping certain specific columns. — MyICQ, May 21 '23 at 06:31

score 2 · Accepted Answer · answered May 21 '23 at 10:00

2

I tried using awk and using NF = NF - 2 which does remove the last 2 columns but for some reason deletes some of the data I have in my 5th column which I need.

This is unexpected for me, I did run your code using GNU Awk 5.1.0 and it works fine, however you are using

docker

so maybe this force usage of erratic version of awk? Anyway, if your task is given as

how to remove the last 2 columns on my TSV file. It has 7 total and the last 2 are not needed for my work and are required to be removed.

this might be simplified to: get first 5 columns of tab-separated file, which can be expressed in awk as

awk 'BEGIN{FS=OFS="\t"}{print $1,$2,$3,$4,$5}' file.tsv

Please run it and write if output is as desired.

answered May 21 '23 at 10:00

Daweo

31,313
3
12
25

Thank you so much @Daweo, that fixed the problem. Do you have any advice on how I can iterate through the data next to append the month and year column onto the end of the data file? The format and idea is quite a bit easier than to delete the columns I hope. – Spoku May 21 '23 at 11:25
1

@Spoku this is different task, please post separate question for that – Daweo May 21 '23 at 17:50
Short, but that line of code does not "remove the last two columns". Instead, it assumes you know you have 7 columns, remove two, and now print 1..5. Got the job done though. – MyICQ May 24 '23 at 05:12

score 1 · Answer 3 · answered May 20 '23 at 17:32

1

The easiest will be rev/cut/rev combination

$ rev inputfile | cut -f3- | rev > output.file

answered May 20 '23 at 17:32

karakfa

66,216
7
41
56

For some reason this still removes some of my data out of the 5th column. It does remove the last 2 columns still. – Spoku May 21 '23 at 07:51
that's not possible unless you have malformed input data. Please try `cat -A inputfile | head` and post it to your question as text. Also if you sourced your file from DOS/Windows domain try `dos2unix inputfile` to fix line endings. – karakfa May 21 '23 at 11:55

dawg · Answer 4 · 2023-05-20T18:24:27.793

1

You are close.

Given the following TSV file:

cat file
1   2   3   4   5
6   7   8   9   10
11  12  13  14  15

You can do this in awk:

awk 'BEGIN{FS=OFS="\t"}
{NF=3} 1
' file

Prints:

1   2   3
6   7   8
11  12  13

Or Ruby:

ruby -ne 'puts $_.split("\t")[0..2].join("\t")' file
# same

Or Perl:

perl -nE 'say join("\t", (split "\t")[0..2])' file
# same

edited May 20 '23 at 18:24

answered May 20 '23 at 17:49

dawg

98,345
23
131
206

I've tried to use the awk one but It doesn't quite seem to work. Alot of the suggested adjustments below seem to continue to delete the same columns. Instead of deleting them and saving as a new file. Is it possible to copy the columns I need to the new file instead? – Spoku May 21 '23 at 08:00
1

Trying to reduce the value of `NF` is undefined behavior. Some awks will trim fields from the end, other will ignore it, others could do anything else (core dump? trim fields from the front? etc...) – Ed Morton May 21 '23 at 14:57

Ted Lyngmo · Answer 5 · 2023-05-21T08:18:12.733

One way could be to use sed:

sed 's/\t[^\t]*\t[^\t]*$//' "$input_file" > "$output_file"

Match explained:

\t - a tab character
[^\t]* - zero or more non-tab characters
\t - a tab character
[^\t]* - zero or more non-tab characters
$ - end of line anchor

Substitute (the // part) with an empty string.

An alternative:

sed -E 's/(\t[^\t]*){2}$//' "$input_file" > "$output_file"

Here the match \t[^\t]* is in a group (...) which is repeated twice {2}.

How do I remove the last 2 columns from a tsv file?

5 Answers5