I need to work across the columns of a large .tsv
and replace the column name if it matches any of a number of strings, labelling it an error if no match is found. Below is a simplified version of what I have, and it works here.
Sample tab-separated input test.tsv
:
Col1 Col2 Col3 Col4
A B C Foo
D E F Bar
G H I Baz
Script:
#!/bin/bash
set -eu
shopt -s failglob
awk 'BEGIN {FS=OFS="\t"} \
{if (NR==1) \
{for (i = 1 ; i <= NF ; i++) \
if ($i == "Col1") { $i = "NewCol1" } \
else if ( $i == "Col2") { $i = "NewCol2" } \
else if ( $i == "Col4") { $i = "NewCol4" } \
else { $i = "Error: "$i } \
} print \
}' test.tsv
Tab-separated output:
NewCol1 NewCol2 Error: Col3 NewCol4
A B C Foo
D E F Bar
G H I Baz
However, in my real process Col4
is not being successfully processed. Instead, it is being flagged as an error. The issue does not occur if I use LibreOffice Calc to open the file and save it again, still as .tsv
. This makes me think it may be a line ending format issue, but I have used vim to check the endings in the input file, and they are consistently \n
. What am I missing here?