Replace empty spaces in a column with a character

Question

My file looks like this:

  Scenario 1                                     0.20          0.00     0.00 r
  Scenario 2                                     0.08          0.34 &   0.34 r
  Scenario 3                          6   12.95 
  Scenario 4                              0.00   0.08   0.00   0.00 &   0.35 r
  Scenario 5                                     0.07          0.08 &   0.42 r
  Scenario 6                          6   8.70 
  Scenario 7                              0.00   0.07   0.00   0.00 &   0.42 r
  Scenario 8                                     0.31          0.28 &   0.70 f
  Scenario 9                          5   5.06

My objectives is: To replace columns with empty cells/spaces/absent values with "-" (there are a total of 8 fields)

The problem I'm facing while using the awk command to do this is that the field separator keeps changing with every line.

What I've done so far: I've extracted the lines which have certain field patterns and placed them in different files. Eg: I have placed Scenario 3,6 and 9 in one file and the rest in another file to make it easier to work on the data. What I have now is:

File 1:

Scenario 3                          6   12.95
Scenario 6                          6   8.70
Scenario 9                          5   5.06

File 2:

  Scenario 1                                     0.20          0.00     0.00 r
  Scenario 2                                     0.08          0.34 &   0.34 r

  Scenario 4                              0.00   0.08   0.00   0.00 &   0.35 r
  Scenario 5                                     0.07          0.08 &   0.42 r

  Scenario 7                              0.00   0.07   0.00   0.00 &   0.42 r
  Scenario 8                                     0.31          0.28 &   0.70 f

Expected output:

  Scenario 1                          -     -    0.20    -     0.00     0.00 r
  Scenario 2                          -     -    0.08    -     0.34 &   0.34 r
  Scenario 3                          6   12.95   -      -      -        -
  Scenario 4                          -   0.00   0.08   0.00   0.00 &   0.35 r
  Scenario 5                          -     -    0.07    -     0.08 &   0.42 r
  Scenario 6                          6   8.70    -      -      -        -
  Scenario 7                          -   0.00   0.07   0.00   0.00 &   0.42 r
  Scenario 8                          -     -    0.31          0.28 &   0.70 f
  Scenario 9                          5   5.06    -      -      -        -

Case 1(using awk with FIELDWIDTHS):

  $ awk 'BEGIN { FIELDWIDTHS="37 3 7 7 7 9 9 "} {for(i=1;i<=NF;++i){printf $i"|"};print""}' main1.txt

| I_BLENDER_0/R_137/CLK (SDFFX2_HVT) |   |       |  0.20 |       |  0.00   |  0.00 r
| I_BLENDER_0/R_137/Q (SDFFX2_HVT)   |   |       |  0.08 |       |  0.34 & |  0.34 r
| I_BLENDER_0/n2757 (net)            | 6 |  12.95|
| I_BLENDER_0/U4847/A1 (AND2X1_LVT)  |   |  0.00 |  0.08 |  0.00 |  0.00 & |  0.35 r
| I_BLENDER_0/U4847/Y (AND2X1_LVT)   |   |       |  0.07 |       |  0.08 & |  0.42 r
| I_BLENDER_0/n2616 (net)            | 6 |  8.70 |
| I_BLENDER_0/U1/A4 (NAND4X0_HVT)    |   |  0.00 |  0.07 |  0.00 |  0.00 & |  0.42 r
| I_BLENDER_0/U1/Y (NAND4X0_HVT)     |   |       |  0.31 |       |  0.28 & |  0.70 f

Case 2(using sed command):

  $  sed "s/^\(.\{,36\}\)$/\1`echo -$_{1..30}|tr -d '-'`/;
      s/^\(.\{38\}\) /\1-/;
      s/^\(.\{43\}\) /\1-/;
      s/^\(.\{50\}\) /\1-/;
      s/^\(.\{57\}\) /\1-/;
      s/^\(.\{64\}\) /\1-/;
      s/^\(.\{73\}\) /\1-/;
      s/ *$//"



  I_BLENDER_0/R_137/CLK (SDFFX2_HVT)  -    -     0.20    -     0.00     0.00 r
  I_BLENDER_0/R_137/Q (SDFFX2_HVT)    -    -     0.08    -     0.34 &   0.34 r
  I_BLENDER_0/n2757 (net)             6   12.95
  I_BLENDER_0/U4847/A1 (AND2X1_LVT)   -   0.00   0.08   0.00   0.00 &   0.35 r
  I_BLENDER_0/U4847/Y (AND2X1_LVT)    -    -     0.07    -     0.08 &   0.42 r
  I_BLENDER_0/n2616 (net)             6   8.70

Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself. — Cyrus, Sep 07 '18 at 05:00
Please add your desired output for that sample input to your question. — Cyrus, Sep 07 '18 at 05:00
The main difficulty derives from having a variable number of spaces as a separator. Could you produce an input file where the fields are TAB-separated or comma-separated, or at least separated by a fixed number of spaces? Or, similarly, pad the fields contents so they have all the same lenght? — simlev, Sep 07 '18 at 08:04
Your _Case1_ shows that you have a `|` at the beginning of your line. This is clearly not in line with the awk-command that created this output. This clearly is an indicated that your lines in your file end with `\r\n` and not just `\n`. You should convert them using `dos2unix` or use awk with the extra flag `awk -v RS="\r?\n" '{...}' file` — kvantour, Sep 07 '18 at 15:56
More information here: https://stackoverflow.com/q/45772525/8344060 — kvantour, Sep 07 '18 at 15:59

Oleg · Answer 1 · 2018-09-07T08:42:07.590

Unfortunately, in this case you need to carefully count the character columns. Here is the code for the input that you provided -- you may need to adjust the numbers for your real input file.

sed "s/^\(.\{,78\}\)$/\1`echo -$_{1..78}|tr -d '-'`/;
  s/^\(.\{38\}\) /\1-/;
  s/^\(.\{43\}\) /\1-/;
  s/^\(.\{50\}\) /\1-/;
  s/^\(.\{57\}\) /\1-/;
  s/^\(.\{64\}\) /\1-/;
  s/^\(.\{73\}\) /\1-/;
  s/ *$//" input_file

Here, the first line adds spaces in the end of line in case the line terminates before reaching 78 characters -- this is then exploited by the substitution. In the end of chain substitutions, any trailing space is removed.

The messy-looking expression echo -$_{1..78}|tr -d '-' in the first line simply produces 78 spaces. You may want to just replace it with a long line of spaces.

Hi, your solution worked too, if you refer the question I've included the actual file when i used the FIELDWIDTHS function as suggested by @kvantour. The only issue I face now is that I'm not getting the proper output for the lines containing "net". What happens is that the remaining fields of these lines are not filled with "-". But i guess I can manage from here, Thanks for your help. — LowerMoon, Sep 07 '18 at 11:05

kvantour · Accepted Answer · 2018-09-07T10:45:49.130

To do this, you can make use of FIELDWIDTHS in Gnu awk:

Basically, we split your lines in constant width fields. The following shows that the lines are split correctly:

$ awk 'BEGIN{ FIELDWIDTHS="13 25 2 7 7 7 9 9"}
       {for(i=1;i<=NF;++i){printf $i"|"};print""}' file

  Scenario 1 |                        |   |       |  0.20 |       |  0.00   |  0.00 r|
  Scenario 2 |                        |   |       |  0.08 |       |  0.34 & |  0.34 r|
  Scenario 3 |                        | 6 |  12.95| ||||
  Scenario 4 |                        |   |  0.00 |  0.08 |  0.00 |  0.00 & |  0.35 r|
  Scenario 5 |                        |   |       |  0.07 |       |  0.08 & |  0.42 r|
  Scenario 6 |                        | 6 |  8.70 |||||
  Scenario 7 |                        |   |  0.00 |  0.07 |  0.00 |  0.00 & |  0.42 r|
  Scenario 8 |                        |   |       |  0.31 |       |  0.28 & |  0.70 f|
  Scenario 9 |                        | 5 |  5.06 |||||

So all we need to do is replace the empty fields with the dash if needed.

$ awk 'BEGIN{ FIELDWIDTHS="13 24 3 7 7 7 9 9"}
       {s=$1$2}
       {s=s ($3~/^[[:blank:]]*$/?" - ":$3)}
       {s=s ($4~/^[[:blank:]]*$/?"   -   ":$4)}
       {s=s ($5~/^[[:blank:]]*$/?"   -   ":$5)}
       {s=s ($6~/^[[:blank:]]*$/?"   -   ":$6)}
       {s=s ($7~/^[[:blank:]]*$/?"   -     ":$7)}
       {s=s ($8~/^[[:blank:]]*$/?"   -     ":$8)}
       {print s}' file

and this gives:

  Scenario 1                          -    -     0.20    -     0.00     0.00 r
  Scenario 2                          -    -     0.08    -     0.34 &   0.34 r
  Scenario 3                          6   12.95   -      -      -        -     
  Scenario 4                          -   0.00   0.08   0.00   0.00 &   0.35 r
  Scenario 5                          -    -     0.07    -     0.08 &   0.42 r
  Scenario 6                          6   8.70    -      -      -        -     
  Scenario 7                          -   0.00   0.07   0.00   0.00 &   0.42 r
  Scenario 8                          -    -     0.31    -     0.28 &   0.70 f
  Scenario 9                          5   5.06    -      -      -        -

remarks:

it would be better to use the real formatting that was used to set up these files.
I always leave an extra space before the fields to account for possible minus-signs
It looks like the floats are written with format %-5.2f. This is why the number 12.95 is not aligned. (%6.2f would have been better)

note: if you play a bit around, you can actually do it shorter. But you sort of lose the feeling of what is going on.

awk 'BEGIN{ FIELDWIDTHS="13 23 5 7 7 7 9 9"} 
     {for(i=3;i<=NF;++i)$i=$i~/^[[:blank:]]*$/?"  -":$i}
     {printf "%-13s%-23s%-5s%-7s%-7s%-7s%-9s%-9s\n",$1,$2,$3,$4,$5,$6,$7,$8}' file

or even shorter

awk 'BEGIN{ FIELDWIDTHS="36 5 7 7 7 9 9"; split(FIELDWIDTHS,a)}
     {for(i=1;i<=NF;++i) printf "%-*s",a[i], ($i~/^ *$/?"  -":$i); print ""}'

Hi, I've tried using FIELDWIDTHS: $ awk 'BEGIN { FIELDWIDTHS="37 3 7 7 7 9 9 "} {for(i=1;i<=NF;++i){printf $i"|"};print""}' filename. I've included the output in the question above ( I've given the actual snippet of the file after i apply the command. — LowerMoon, Sep 07 '18 at 10:41
@LowerMoon So it seems the shorter versions which I just added will help you out. — kvantour, Sep 07 '18 at 10:47
@LowerMoon also be aware that the first awk line I show, is nothing more than a demonstration what `FIELDWIDTHS` is doing. It is not the answer to your question. All other solutions are. — kvantour, Sep 07 '18 at 10:49
The solution you provided worked similar to @Oleg 's solution, But the starting of one or two lines have been cut out and replaced with spaces, trying to figure out why. — LowerMoon, Sep 07 '18 at 11:07
It looks to me that you are suffering from the `\r\n` dos line-termination. Just run `dos2unix` on your file and try again! This is 100% the reason of the strange behaviour. — kvantour, Sep 07 '18 at 15:54
The dos2unix command worked :) now I can easily extract the various fields but I'm still not able to fill the other fields of the "net" lines with "-", no worries, from this point on I can easily manage it. Could you edit your answer and add the dos2unix part? Will be helpful to others. — LowerMoon, Sep 09 '18 at 05:17

score 1 · Answer 3 · answered Sep 07 '18 at 09:39

Using GNU awk and FIELDWIDTHS variable to split fields based on their length:

awk 'BEGIN{
      FIELDWIDTHS="38 4 7 7 7 9 6"
      colnr=split(FIELDWIDTHS,a," ")
    } 
    {
      for(i=1;i<=colnr;i++){
        $i=sprintf("%-"a[i]"s",((!$i&&$i!=0)||$i~/^ *$/?"-":$i))
      }
    }1' file
  Scenario 1                           -    -       0.20    -       0.00      0.00 r
  Scenario 2                           -    -       0.08    -       0.34 &    0.34 r
  Scenario 3                           6    12.95   -       -       -         -
  Scenario 4                           -    0.00    0.08    0.00    0.00 &    0.35 r
  Scenario 5                           -    -       0.07    -       0.08 &    0.42 r
  Scenario 6                           6    8.70    -       -       -         -
  Scenario 7                           -    0.00    0.07    0.00    0.00 &    0.42 r
  Scenario 8                           -    -       0.31    -       0.28 &    0.70 f
  Scenario 9                           5    5.06    -       -       -         -

The BEGIN block sets the array a with the length of all fields, and store the number of fields in the variable colnr.

The default block loops through all fields and rewrites them with the sprintf() function.
If the field contains only blanks $i~/^ *$/ or doesn't exist !$i&&$i!=0, replace it with a -. If not, the field remains untouched.

Replace empty spaces in a column with a character

3 Answers3

Linked