awk not making substitutions in last column of csv

Question

I need to work across the columns of a large .tsv and replace the column name if it matches any of a number of strings, labelling it an error if no match is found. Below is a simplified version of what I have, and it works here.

Sample tab-separated input test.tsv:

Col1    Col2    Col3    Col4
A   B   C   Foo
D   E   F   Bar
G   H   I   Baz

Script:

#!/bin/bash

set -eu
shopt -s failglob

awk 'BEGIN {FS=OFS="\t"} \
     {if (NR==1) \
        {for (i = 1 ; i <= NF ; i++) \
        if ($i == "Col1") { $i = "NewCol1" } \
        else if ( $i == "Col2") { $i = "NewCol2" } \
        else if ( $i == "Col4") { $i = "NewCol4" } \
        else { $i = "Error: "$i } \
        } print \
        }' test.tsv

Tab-separated output:

NewCol1 NewCol2 Error: Col3 NewCol4
A   B   C   Foo
D   E   F   Bar
G   H   I   Baz

However, in my real process Col4 is not being successfully processed. Instead, it is being flagged as an error. The issue does not occur if I use LibreOffice Calc to open the file and save it again, still as .tsv. This makes me think it may be a line ending format issue, but I have used vim to check the endings in the input file, and they are consistently \n. What am I missing here?

I wouldn't give up on your line-ending theory quite yet. Try checking the file (or the first few lines, at least) with `cat -t`. — jas, Nov 08 '19 at 19:52
With the sample and code provided, it works just fine. The error you get is in `Col3` because you have to add `else if ( $i == "Col3") { $i = "NewCol3" } \`. between the Col2 and Col4 management. — ingroxd, Nov 08 '19 at 19:54
@ingroxd - yeah, the error in there is expected - the code works fine, as I said in the post, but there is another issue. @jas `cat -t` is showing `^M` as the line ending on my real input file. What's the significance of this? The real input file has not been on a Windows system though. — tesolat, Nov 08 '19 at 20:08
@tesolat: If you are using `gnu-awk` then use `awk -v RS='\r?\n' '...' file` — anubhava, Nov 08 '19 at 20:16
Okay, got there in the end. @anubhava fixed it directly (though was the `?` in the `RS` in error?) and @jas pointed me to `^M`, which can be processed out using `sed`. Etiquette-wise, how to give thanks here? Answer it myself and give kudos to you both? — tesolat, Nov 08 '19 at 20:37
@tesolat the `?` is present so gawk will split the file on `\n`s whether preceded by `\r` or not. The downside to that is that then CSVs exported from Excel which have `\n` within fields will get mangled. See https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it and https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk. — Ed Morton, Nov 08 '19 at 23:34
Both links are very useful @EdMorton. Neither showed in the searches I made prior to asking the question though :/ Hopefully setting the problem in this particular context (awk, column processing) may be helpful to someone else! — tesolat, Nov 09 '19 at 17:52

score 0 · Accepted Answer · edited Nov 09 '19 at 09:45

0

Evidently your input has line ending with ^M or \r. You may use this awk with a custom regex in RS (this requires GNU awk):

awk -v RS='\r?\n' '...' file

-v RS='\r?\n' sets RS or record separator as optional \r followed by \n, thus allowing it to match lines ending with \n or \r\n.

edited Nov 09 '19 at 09:45

RavinderSingh13

130,504
14
57
93

answered Nov 08 '19 at 20:50

anubhava

761,203
64
569
643

awk not making substitutions in last column of csv

1 Answers1