Unix method to remove last 4 columns from csv file

Question

Is there a way to use bash to remove the last four columns for some input CSV file? The last four columns can have fields that vary in length from line to line so it is not sufficient to just delete a certain number of characters from the end of each row.

peteches · Accepted Answer · 2013-01-19T21:42:47.720

23

Cut can do this if all lines have the same number of fields or awk if you don't.

cut -d, -f1-6 # assuming 10 fields

Will print out the first 6 fields if you want to control the output seperater use --output-delimiter=string

awk -F , -v OFS=, '{ for (i=1;i<=NF-4;i++){ printf $i, }; printf "\n"}'

Loops over fields up to th number of fields -4 and prints them out.

edited Jan 19 '13 at 21:42

answered Jan 19 '13 at 20:46

peteches

3,447
1
13
15

in the awk line, I guess you want to printf $i not print $i, right? typo maybe? and `print $i ,` won't work, you may want to have `;` another typo? – Kent Jan 19 '13 at 21:21
@kent yeah meant printf $i, don't you need to have the ; if only one command. – peteches Jan 19 '13 at 21:45
Ahh I see what you mean. The comma is expanded to the value of the OFS variable – peteches Jan 19 '13 at 22:49
1

This is not a robust method as you might have quoted fields with commas inside `"field1",field2,"field 3 with , comma",field4` – kvantour Jul 22 '19 at 13:43

score 18 · Answer 2 · answered Jan 19 '13 at 21:50

18

cat data.csv | rev | cut -d, -f-5 | rev

rev reverses the lines, so it doesn't matter if all the rows have the same number of columns, it will always remove the last 4. This only works if the last 4 columns don't contain any commas themselves.

answered Jan 19 '13 at 21:50

Perleone

3,958
1
26
26

2

This is a really nice solution in my opinion, +1 for the use of rev (I didn't know it existed) – skd Feb 06 '14 at 16:24
I get `rev: stdin: 0: Invalid or incomplete multibyte or wide character` – user2023370 Feb 09 '21 at 12:45
Seems my file is a Windows csv file with `Little-endian UTF-16 Unicode text`. I had to use `awk` in the end (for other reasons.) – user2023370 Feb 10 '21 at 17:57

score 9 · Answer 3 · answered Jan 19 '13 at 20:34

You can use cut for this if you know the number of columns. For example, if your file has 9 columns, and comma is your delimiter:

cut -d',' -f -5

However, this assumes the data in your csv file does not contain any commas. cut will interpret commas inside of quotes as delimiters also.

score 5 · Answer 4 · edited Jul 17 '15 at 06:36

5

awk -F, '{NF-=4; OFS=","; print}' file.csv

or alternatively

awk -F, -vOFS=, '{NF-=4;print}' file.csv

will drop the last 4 columns from each line.

edited Jul 17 '15 at 06:36

Dan Getz

8,774
6
30
64

answered Jun 10 '15 at 20:58

YH Wu

465
6
6

Good way to drop the extra columns, but for me this replaces the commas between columns with spaces in the output. Is there an easy way to avoid that and keep them as commas? – Dan Getz Jun 11 '15 at 02:07
You can add back your delimiter with awk -F, '{NF-=4; OFS=","; print}' – YH Wu Jun 11 '15 at 15:08
Great, that works for me. Also could set `OFS` in a `BEGIN` block or with the `-v` command-line argument like `awk -F, -vOFS=, ...` – Dan Getz Jun 11 '15 at 15:25
This method only works for GNU awk. According to Posix, changing NF is undefined behaviour. – kvantour Jul 22 '19 at 13:44
@kvantour tiny correction regarding "**changing** NF": incrementing `NF` would be defined behavior (a new field populated with an empty string is added), it's just decrementing `NF` that's undefined behavior. – Ed Morton May 17 '23 at 13:53

score 3 · Answer 5 · answered Jan 19 '13 at 21:46

3

This might work for you (GNU sed):

sed -r 's/(,[^,]*){4}$//' file

answered Jan 19 '13 at 21:46

potong

55,640
6
51
83

In my sense, it is the best answer here! – Noémien Kocher Jan 14 '16 at 12:40

score 3 · Answer 6 · answered Jul 22 '19 at 13:54

None of the mentioned methods will work properly when having CVS files with quoted fields with a <comma> character. So it is a bit hard to just use the <comma>-character as a field separator.

The following two posts are now very handy:

What's the most robust way to efficiently parse CSV using awk?
[U&L] How to delete the last column of a file in Linux (Note: this is only for GNU awk)

Since you work with GNU awk, you can thus do any of the following two:

$ awk -v FPAT='[^,]*|"[^"]+"' -v OFS="," 'NF{NF-=4}1'

Or with any awk, you could do:

$ awk 'BEGIN{ere="([^,]*|\042[^\042]+\042)"
             ere=","ere","ere","ere","ere"$"
       }
       {sub(ere,"")}1'

score 1 · Answer 7 · answered Jan 19 '13 at 21:17

awk one-liner:

awk -F, '{for(i=0;++i<=NF-5;)printf $i", ";print $(NF-4)}'  file.csv

the advantage of using awk over cut is, you don't have to count how many columns do you have, and how many columns you want to keep. Since what you want is removing last 4 columns.

see the test:

kent$  seq 40|xargs -n10|sed 's/ /, /g'           
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
11, 12, 13, 14, 15, 16, 17, 18, 19, 20
21, 22, 23, 24, 25, 26, 27, 28, 29, 30
31, 32, 33, 34, 35, 36, 37, 38, 39, 40

kent$  seq 40|xargs -n10|sed 's/ /, /g' |awk -F, '{for(i=0;++i<=NF-5;)printf $i", ";print $(NF-4)}'
1,  2,  3,  4,  5,  6
11,  12,  13,  14,  15,  16
21,  22,  23,  24,  25,  26
31,  32,  33,  34,  35,  36

score 1 · Answer 8 · answered Jan 20 '13 at 05:14

1

This awk solution in a hacked way

awk -F, 'OFS=","{for(i=NF; i>=NF-4; --i) {$i=""}}{gsub(",,,,,","",$0);print $0}' temp.txt

answered Jan 20 '13 at 05:14

Mirage

30,868
62
166
261

Unix method to remove last 4 columns from csv file

8 Answers8

Linked

Related