Using Awk to change a block of blanks to '0's while screen out rows with blanks in a different position

Question

This is a follow up question from a previous post. I'm processing a file in awk. I want to pass along the rows in the file that have blanks in column positions 45 through 50 and I want to do work on the rows that have blanks in column positions 60 through 73.

Specifically I want to replace the blanks in columns positions 60 through 73 with 0s.

That way the output file will have the original rows with blanks in 45-50 untouched. and the rows with blanks in 60-73 with have been replaced with '0's. So the output file will be the same as the input file only with zeros in the relevant rows in positions 60-73.

Because I work at a bank I had to give an example that wouldn't let me show bank data. My previous post was not an accurate description of the issue.

I've mocked up the data more accurately replacing the data with all 0s. As you can see there are rows that look like they contain mostly meta data info and are different from the actual data rows. That's what I'm trying to filter out. The first row starting with 0000s is the example of the row I'm trying to fix. It has a blank starting at position 60-73. That's the position I want to change to all '0s'. The second row is an example of a regular row with no errors in it. The third and fourth rows are the meta data rows I want to skip. In the this case I've chosen column positions 45-50 as the blank columns I want to identify that tell me to skip those rows, This is because in both the first two data rows, those columns are guaranteed to have data in them. Hopefully that clears it up. The data example is shown below (first 2 rows are not present in the real data, they're just added here so you can easily see the character positions):

         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
0000    000000000                           000000000000000             00000000
0000    00000000000  0000000                000000000000000000000000000000000000
0000    000000
0000        0000000000000000000000000000000000  0000000              T0000000000

All the answers I was given worked on the old example file except in the above data, it was processing the blanks to '0's on the last two meta data rows. Those were the:

awk '{print gensub(/^(.{9}) {10}([^ ]{24})/, "\\10000000000\\2", "g")}' file

awk '{if (match($0,/[ ]{10}/) && RSTART == 10 && substr($0,20) ~ /^#*$/) sub(/[ ]{10}/,"0000000000")}1' file

sed -E 's/^(.{9}) {10}(.{5}[^ ]{10})/\10000000000\2/' file

awk '
substr($0,10,10) ~ /^ +$/ && substr($0,20) !~ / / {
  $0=substr($0,1,9) "0000000000" substr($0,20)
}
1
' Input_file

All of these scripts worked great on the old example file but processed the meta data rows on the above example. I was naive in creating the first example data set in that it did not accurately depict my issue . I was trying to make it easy for the people reading my post to see the data. I wasn't aware the simplification had changed the questions. For the unfamiliar with the previous post I'm including it below:

Using an if block in awk

I hope I've been thorough enough. Please let me know if you have any questions.

This is my attempt. I've been working with this over the weekend and found the soultions using match would not work as well as the solutions using substr. This is my attempt. The "longtst" file is the file printed above

cat longtst|awk '{if (substr($0,45,5) !~ /^[[:blank:]]*$/)
                    {if (substr($0,60,13) ~ /^[[:blank:]]*$/)
                     $0=substr($0,1,59) gsub(/ /,0,substr($0,60,13))  substr($0,74) }
                       else print $0 }'

I'm getting a

"-ksh: .: syntax error: `else' unexpected"

The expected output would be the following:

         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
0000    000000000                           000000000000000000000000000000000000
0000    00000000000  0000000                000000000000000000000000000000000000
0000    000000
0000        0000000000000000000000000000000000  0000000              T0000000000

As you can see, the 13 blanks in the first row get filled in with 0s starting at col pos 60 through 73.As per Ed's note, the rows that have blanks at col pos 45 through 50 just get passed through. Thanks!

Please don't make us have to read sample input/output that's 100+chars wide and requires a scroll bar - come up with a [mcve] (emphasis on **minimal** that **demonstrates** your problem and make sure to post the expected output too, not just the sample input, as we can't test one without the other. Also make sure it's clear when you say `have blanks in column positions 45 through 50` if you really mean `have blanks in` (which means it could have other chars too) or you actually mean `have ONLY blanks in`. — Ed Morton, Jul 16 '21 at 21:26
Hi Ed, Done! Sorry about that! And yes I want to exclude the rows from processing if there are ANY 'blank' characters in column positions 45-50. Conversely, If Col Pos 45- has ANY NON-Blank Characters in any of the positions then process that row. Thanks! — Carbon, Jul 18 '21 at 19:52
Thanks for reducing the sample input, please also [edit] your question to show the expected output given that input. regarding `Because I work at a bank I had to give an example that wouldn't let me show bank data.` - we [almost] all work somewhere that means we can't post actual data, what we do instead when creating and posting a [mcve] is make up text that is similar to that real data in terms of the types of strings/characters used, e.g. `John Smith` where a customer name would appear, `5 Main St` for an address, `$50,000` for a salary, etc. That's better than a `0` for every character. — Ed Morton, Jul 19 '21 at 13:01
I added header rows to your input so we can see character positions. Please check that your descriptive text and the example match regarding character positions. For example your text says the first row `it has a blank starting at position 60-73` but the blanks in that row end at position 62, not 63. Since you wanted blanks replaced by `0`s and your input is all `0`s, it's going to be hard to see in the output where the replacements occur. For the purposes of this question you might want to replace the blanks with some character that does not appear in your input, you can change that later. — Ed Morton, Jul 19 '21 at 13:14
In your description you say `The third and fourth rows are the meta data rows I want to skip.` but "skip" could mean "don't change it, just print it as-is" or "don't print it at all" so that's an example of a case where posting the expected output would clarify your requirements. — Ed Morton, Jul 19 '21 at 13:19
The code ypu posted would **not** produce the error message `"-ksh: .: syntax error: `else' unexpected"` by the way, it'd produce an error message like `awk: cmd. line:3: ^ gsub third parameter is not a changeable object`. It's important to post code and error messages that match up when asking for help with that code. — Ed Morton, Jul 19 '21 at 13:29

Ed Morton · Accepted Answer · 2021-07-19T13:34:42.853

1

It seems like this might be what you're trying to do:

$ cat tst.awk
substr($0,45,6) !~ / / {
    tgt = substr($0,60,14)
    gsub(/ /,0,tgt)
    $0 = substr($0,1,59) tgt substr($0,74)
}
{ print }

$ awk -f tst.awk file
0000    000000000                           000000000000000000000000000000000000
0000    00000000000  0000000                000000000000000000000000000000000000
0000    000000
0000        0000000000000000000000000000000000  0000000              T0000000000

but without the expected output that's a guess.

edited Jul 19 '21 at 13:34

answered Jul 19 '21 at 13:28

Ed Morton

188,023
17
78
185

Thanks Ed. That did it! Also Ed, one question: So does the "substr($0,45,6) !~ / /" act like an IF statement and the code between the {} acts as the then statement? – Carbon Jul 19 '21 at 14:12
1

Yes. An awk script is made up of ` { }` statements where the `` is executed if the `` is true for the current input record being processed. In C or similar it'd be equivalent to `if ( ) { }` which you CAN also write in awk but don't have to. – Ed Morton Jul 19 '21 at 14:18

Using Awk to change a block of blanks to '0's while screen out rows with blanks in a different position

1 Answers1

Linked