2

I have a file containing 4 columns separated by tabs. In the last column there can be sometimes trailing tabs between the quotation marks. It is a similar question to trim leading and trailing spaces from a string in awk. Here is an example:

col1 col2 col3 col4
"12" "d" "5" "this is great"
"13" "d" "6" "this is great<tab>"
"14" "d" "7" "this is great<tab><tab>"
"15" "d" "8" "this is great"
"16" "d" "9" "this is great<tab>"

This is what I come up with so far:

gawk --re-interval -F '"' 'NF = 9 {if ($8 ~ /\t$/) {gsub(/[\t]+$,"",$8)} ;}'

The problem is that it destroys my format meaning I get no quotation marks for each column. The good thing is that the tabs in between the columns are still there:

col1 col2 col3 col4
12 d 5 this is great
13 d 6 this is great
14 d 7 this is great
15 d 8 this is great
16 d 9 this is great

What do I do wrong?

S.A
  • 201
  • 1
  • 10

1 Answers1

2

You need to tell awk that the output field separator (OFS) is also a quote. For example:

awk -v OFS='"' -F '"' 'NF == 9 {
  if ($8 ~ /\t$/) {
    gsub(/[\t]+$/,"",$8)
  }
}
1' input.txt

Output:

col1   col2   col3   col4
"12"   "d"    "5"    "this is great"
"13"   "d"    "6"    "this is great"
"14"   "d"    "7"    "this is great"
"15"   "d"    "8"    "this is great"
"16"   "d"    "9"    "this is great"
Thor
  • 45,082
  • 11
  • 119
  • 130
  • Is it possible that the output is inconsistent with the input (spacing between the columns)? I compare with the OP's example file. Otherwise +1 for explaining what the issue is ;-) – kvantour Mar 14 '18 at 16:57
  • @kvantour: SO doesn't represent hard tabs properly, so we have to adapt – Thor Mar 14 '18 at 19:51