3

I'm looking for a relatively simple method for truncating CSV header names to a given maximum length. For example a file like:

one,two,three,four,five,six,seven
data,more data,words,,,data,the end

Could limit all header names to a max of 3 characters and become:

one,two,thr,fou,fiv,six,sev
data,more data,words,,,data,the end

Requirements:

  • Only the first row is affected
  • I don't know what the headers are going to be, so it has to dynamically read and write the values and lengths

I tried a few things with awk and sed, but am not proficient at either. The closest I found was this snippet:

csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' '{ for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) } { printf("\"%s\"\n", $0) }' >tmp-3rd

But it was focusing on columns and also feels more complicated than necessary to use csvcut.

Any help is appreciated.

  • With `awk`: `awk 'BEGIN{ FS=OFS="," } NR==1{ for(i=1; i<=NF; i++){ $i=substr($i, 1, 3) } }1' file` – Cyrus Dec 09 '21 at 20:00
  • Thanks @cyrus. I actually needed to add some logic to put back any double quotes that were truncated, as well as make sure to not end with a space. It gets a little messy, but here's what an awk amateur came up with: `awk 'function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s } BEGIN{ FS=OFS="," } NR==1{ for(i=1; i<=NF; i++){ if(length($i)>62) { $i=rtrim(substr($i, 1, 62))"\"" } else { $i } } }1' file` (now trimming to 62 characters) – Seth Leonard Dec 10 '21 at 01:09

2 Answers2

1

With GNU sed:

sed -E '1s/([^,]{1,3})[^,]*/\1/g' file

Output:

one,two,thr,fou,fiv,six,sev
data,more data,words,,,data,the end

See: man sed and The Stack Overflow Regular Expressions FAQ

Cyrus
  • 84,225
  • 14
  • 89
  • 153
1

With your shown samples, please try following awk program. Simple explanation would be, setting field separator and output field separator as , Then in first line cutting short each field of first line to 3 chars as per requirement and printing them(new line after last field of first line), printing rest of lines as it is.

awk '
BEGIN { FS=OFS="," }
FNR==1{
  for(i=1; i<=NF; i++){
    printf("%s%s",substr($i, 1, 3),(i==NF?ORS:OFS))
  }
  next
}
1
' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93