1

I have a tab-delimited file like this but for some rows, the column number is less than what it should be (64 columns), how do I fill those cells with 0, for example (the file has 64 columns and 4750 rows it would be too big):

a b c d e
1 2 3 4 5
1
0 2
3

(column 2,3,4 has 1,2,1 number of columns corresponding and with no trailing \ts)

I want to make it

a b c d e
1 2 3 4 5
1 0 0 0 0
0 2 0 0 0
3 0 0 0 0

I have tried sed 's/\t\t/\t0\t/g' however all the solution like this requires column space to be existed there, how can I fix my table?

Eric
  • 53
  • 5

3 Answers3

4
$ awk -v c=5 'BEGIN{FS=OFS="\t"} {for(i=NF+1; i<=c; i++) $i=0} 1' ip.txt
a   b   c   d   e
1   2   3   4   5
1   0   0   0   0
0   2   0   0   0
3   0   0   0   0

Change c to 64 for your real input.


sed isn't suited for this, but for fun, if you don't have empty lines, you can do:

sed -E ':a s/^([^\t]+)((\t[^\t]+){0,3})$/\1\2\t0/; ta'

The s command here will match only if there are less than 5 fields (change {0,3} to {0,62} for 64 column case) and then add one more field. As long as the substitution succeeds, ta will branch to label a, thus emulating a loop.


With perl (see Append zeros to list for explanations):

perl -pe 's|$|"\t0" x (4 - tr/\t//)|e'
perl -F'\t' -lane 'print join "\t", @F, (0)x(4-$#F)'
Sundeep
  • 23,246
  • 2
  • 28
  • 103
3

This might work for you (GNU sed):

sed ':a;s/[^\t]\+/&/64;t;s/$/\t0/;ta' file

Check to see if there are the correct number of columns (for you 64) and if so, bail out.

Otherwise, append a tab and a zero and repeat.

N.B. This assumes the file is tab delimited, for a more generic solution, use:

d=' '
r=$(<<<"$d" sed 's/.*/[^&]\\+/')
sed ':a;s/'"$r"'/&/64;t;s/$/'"$d"'0/;ta' file

Where $d is the delimiter and $r is the dual regexp.

The solution if the delimiter is a / or a \ is left to the reader.

potong
  • 55,640
  • 6
  • 51
  • 83
2

Adding one more variant more generic one, could you please try following written and tested with GNU awk with shown samples. By this we need NOT to mention any number of fields it will find the maximum number of fields and will add zeroes into it as per it.

awk '
FNR==NR{
  maxField=(maxField>NF?maxField:NF)
  next
}
{
  if(NF<maxField){
    val=sprintf("%0"maxField-NF"d","0")
    gsub(/0/,"& ",val)
  }
  print $0,val
  val=""
}
' Input_file Input_file | column -t
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93