0

Seen tons of examples but I cannot seem to get any to work in this script from https://stackoverflow.com/a/72720612 by another user @Just Khaithang on this site and it works great but I need to retain my column spacing as well since it is critical. This is the .txt file sample as I have posted here a couple times. There is 1 space at the beginning and 20 spaces from the beginning of column 1 to the beginning of column 2 and 4 spaces in between 2 and 3. see below for the script. The outcome changes a date from user input thus using the variable $broken_date. This script is called from another shell script with awk -v. The "" spaces in between work but since column 1 varies it is not staying aligned.

 146327A             0000000020220422    000002012633825-0003-1
 137149D             0000000045220419    000004512632587-0003-0
 137050C             0000000018220419    000001812632410-0003-0
 137147A             0000000045220419    000004512632487-0003-0
 137233B             0000000144220421    000014412630711-0003-1
 137599B             0000000120220419    000012012632543-0003-0
 137604D             0000000015220419    000001512632588-0003-0
 151031-001E         0000000041220517    000004112575320-0003-1
 151248-001A         0000000021220421    000002112629944-0003-1
 151249-001A         0000000005220422    000000512634524-0003-1
 151827-002B         0000000040220421    000004012629223-0003-1
 127941B             0000000045220422    000004512634676-0003-1
 137105A             0000000020220421    000002012630791-0003-1
 132136A             0000000005220419    000000512632590-0003-0
 132137A             0000000005220419    000000512632591-0003-0
 134180D             0000000052220419    000006012622399-0003-1
 134307-004K         0000000016220420    000001612635621-0003-0
 141014-001B         0000000040220419    000004012632585-0003-0

{
    c2=$2
    c3=$3
    sub("0+","",c2)
    sub("0+","",c3)
    sub("-.*","",c3)
    if (length(c2) == 8) {
        c2_value=substr(c2,1,2)
    } else if (length(c2) == 9) {
        c2_value=substr(c2,1,3)
    }

    if (length(c3) == 10) {
        c3_value=substr(c3,1,2)
    } else if (length(c3) == 11) {
        c3_value=substr(c3,1,3)
    }

    if(c2_value != c3_value) {
        sub("[1-9].*$","",$2)
        date="$broken_date"  # this value taken from user input
        print  $1"            "$2 c2_value broken_date"   "$3
    } else {
        print $0
    }
}

Output should be

 146327A             0000000020220422    000002012633825-0003-1
 137149D             0000000045220419    000004512632587-0003-0
 137050C             0000000018220419    000001812632410-0003-0
 137147A             0000000045220419    000004512632487-0003-0
 137233B             0000000144220421    000014412630711-0003-1
 137599B             0000000120220419    000012012632543-0003-0
 137604D             0000000015220419    000001512632588-0003-0
 151031-001E         0000000041220517    000004112575320-0003-1
 151248-001A         0000000021220421    000002112629944-0003-1
 151249-001A         0000000005220422    000000512634524-0003-1
 151827-002B         0000000040220421    000004012629223-0003-1
 127941B             0000000045220422    000004512634676-0003-1
 137105A             0000000020220421    000002012630791-0003-1
 132136A             0000000005220419    000000512632590-0003-0
 132137A             0000000005220419    000000512632591-0003-0
 134180D             0000000052220909    000006012622399-0003-1
 134307-004K         0000000016220420    000001612635621-0003-0
 141014-001B         0000000040220419    000004012632585-0003-0

The only difference is in the date but that is what it needs to do on the 3rd line from the bottom 2nd column where I entered 220909.

I am doing this in a Korn shell via MKS Toolkit; Awk says file version 9.2.3.2096. This is on an old Windows XP machine.

tripleee
  • 175,061
  • 34
  • 275
  • 318
fletching
  • 31
  • 4

2 Answers2

1

This will behave the same way using any awk:

$ cat tst.sh
#!/usr/bin/env bash

broken_date='220909'

awk -v broken_date="$broken_date" '
substr($2,4,7) != substr($3,1,7) {
    tail = $0
    nf = 0
    while ( tail != "" ) {
        match(tail,/^[ \t]*/)
        sep[++nf] = substr(tail,RSTART,RLENGTH)
        tail = substr(tail,RSTART+RLENGTH)
        match(tail,/^[^ \t]*/)
        fld[nf] = substr(tail,RSTART,RLENGTH)
        tail = substr(tail,RSTART+RLENGTH)
    }

    fld[2] = substr(fld[2],1,10) broken_date
    $0 = ""
    for ( i=1; i<=nf; i++ ) {
        $0 = $0 sep[i] fld[i]
    }
}
{ print }
' "${@:--}"

$ ./tst.sh file
 146327A             0000000020220422    000002012633825-0003-1
 137149D             0000000045220419    000004512632587-0003-0
 137050C             0000000018220419    000001812632410-0003-0
 137147A             0000000045220419    000004512632487-0003-0
 137233B             0000000144220421    000014412630711-0003-1
 137599B             0000000120220419    000012012632543-0003-0
 137604D             0000000015220419    000001512632588-0003-0
 151031-001E         0000000041220517    000004112575320-0003-1
 151248-001A         0000000021220421    000002112629944-0003-1
 151249-001A         0000000005220422    000000512634524-0003-1
 151827-002B         0000000040220421    000004012629223-0003-1
 127941B             0000000045220422    000004512634676-0003-1
 137105A             0000000020220421    000002012630791-0003-1
 132136A             0000000005220419    000000512632590-0003-0
 132137A             0000000005220419    000000512632591-0003-0
 134180D             0000000052220909    000006012622399-0003-1
 134307-004K         0000000016220420    000001612635621-0003-0
 141014-001B         0000000040220419    000004012632585-0003-0

It just retains whatever spacing you already have. I made the script more general than necessary so you can see how to break an input record into arrays of separators (sep[]) and fields (fld[]) so you can do whatever you like with similar problems in future.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Broken_date will not be static and will be from user input. I am just using that date as an example of user input from another script that takes the input and is assigned to the variable. Can this be modified to accept user input from a variable? – fletching Jul 15 '22 at 16:38
  • I should add user date is yymmdd format. – fletching Jul 15 '22 at 16:40
  • It IS using a variable - `broken_date='220909'`. If you're saying you want to populate the awk variable from a shell variable, I already refereed you to the page that tells you how to do that (see [my earlier comment](https://stackoverflow.com/questions/72995611/need-to-retain-column-spacing-in-awk-script/72997116#comment128926953_72995611)) but it's `awk -v awkvar="$shellvar"`. The date format is irrelevant, it's just a string., – Ed Morton Jul 15 '22 at 16:41
  • I updated it to populate the awk variable from the contents of a shell variable. – Ed Morton Jul 15 '22 at 16:44
0

Assumptions:

  • GNU awk/FIELDWIDTHS is available to OP (in comments OP mentions not able to get FIELDWIDTHS to work which I take to mean that OP is running GNU awk otherwise I'd expect OP to state something about an error or FIELDWIDTHS not available)
  • input field widths are known in advance (eg, all inputs have the same spacing)

One idea for modifying OP's current code to work with GNU awk/FIELDWIDTHS:

broken_date='220909'

awk -v bdate="${broken_date}" '
BEGIN  { FIELDWIDTHS="21 20 100"
         fmt="%-21s%-20s%s\n"                # define our printf format to match FIELDSWIDTHS
       }
       { c2=$2; gsub(/ /,"",c2); sub("0+","",c2)
         c3=$3; gsub(/ /,"",c3); sub("0+","",c3); sub("-.*","",c3)

              if (length(c2) == 8)  { c2_value=substr(c2,1,2) }
         else if (length(c2) == 9)  { c2_value=substr(c2,1,3) }

              if (length(c3) == 10) { c3_value=substr(c3,1,2) }
         else if (length(c3) == 11) { c3_value=substr(c3,1,3) }

         if (c2_value != c3_value) { printf fmt,$1,substr($2,1,length(gensub(/ /,"","g",$2))-6) bdate,$3 }
         else                      { print $0 }
       }
' x > y

Reworking OPs logic (also addresses length(c3) == 9) while maintaining FIELDWIDTHS approach:

broken_date='220909'

awk -v bdate="${broken_date}" '
BEGIN  { FIELDWIDTHS="21 20 100"
         fmt="%-21s%-20s%s\n"
       }
       { c2=$2;
         gsub(/^[0]+| /,"",c2 )                    # strip leading zeroes and all spaces
         c2=substr(c2,1,length(c2)-6)              # strip off last 6 characters

         pfx=$2                                    # find the prefix of $2
         gsub(/ /,"",pfx)                          # strip all spaces
         pfx=substr(pfx,1,length(pfx)-6)           # strip off last 6 characters

         split($3,a,"-")                           # split $3 on hyphens
         c3=a[1]                                   # grab 1st hyphen delimited field
         gsub(/^[0]+| /,"",c3)                     # strip leading zeroes and all spaces
         c3=substr(c3,1,length(c3)-8)              # strip off last 8 characters

         if (c2 != c3) $2=pfx bdate                # replace $2 with its prefix + bdate (aka broken_date)

         printf fmt,$1,$2,$3
       }
' x > y

Both of these generate:

$ diff x y
16c16
<  134180D             0000000052220419    000006012622399-0003-1
---
>  134180D             0000000052220909    000006012622399-0003-1
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • If you are using leading zeros what I failed to include in my sample is that the number of zeros in column 2 might be as much as 9 and as little as 7. Column 3 is 4 to 6 if using the same logic. I included just a small sample.This is one that is not coming out correct. 117971D 0000000001220419 000010012622774-0003-1 – fletching Jul 18 '22 at 15:45
  • `'not coming out correct'` ? the code does or does not replace `220419` with `220909`? if I'm understanding the logic you strip off the last 6 digits from `$2` leaving you with `1`, and you strip off the last 8 digits from `$3` leaving you with `100`, and since `1 != 100` we replace `220419` with `220909` ... which I've verified this code does; if that's not what you intend then please update the question with more details (and samples) to explain how you're parsing the 2nd/3rd fields – markp-fuso Jul 18 '22 at 17:26
  • i will see if I can explain it better after I run it again and check for my syntax error. If I cannot get it figured out I will add more details to the question. Thanks – fletching Jul 18 '22 at 18:21
  • Got it working, Thanks I just had some syntax this old version of MKS did not like but nothing major. – fletching Jul 18 '22 at 19:13