1

I have tried this before but need to change direction. I was changing the numbers but I found I need to change the date after the numbers differ. My text has 3 columns and the 2nd and 3rd are similar but occasionally there is a 1 - 4 digit number that do not match. I have added a sample so you can see what I am talking about. I have added my substr command but that was to change the numbers and reality I need to make the date after those numbers 220102 or a user entered date. When the number in bold in column 2 do not match column 3 then the date needs to change to the user input or 220102 in column 2 after the numbers in bold whichever is easiest. I have a script for user input date change I could possibly point it to. Thanks in advance.

126731E             0000000033220422    000003312634460-0003-1
134180D             0000000052220419    000006012622399-0003-1
 134307-004K         0000000016220420    000001612635621-0003-0
 141014-001B         0000000040220419    000004012632585-0003-0
134886C             0000000034220419    000007612620821-0003-1
 123899B             0000000050220412    000005012635007-0003-1
121543C             0000000059220419    000007512621925-0003-1
 118238C             0000000070220419    000007012632584-0003-0
121852A             0000000122220419    000013512622569-0003-1
 123124A             0000000141220419    000014112631954-0003-0
 123157C             0000000344220422    000034412634707-0003-1

sample text columns

BEGIN{
}
{
    part1= substr($0,1,40)
    part2= substr($0,49)
    qty= substr($2,4,7)
    print part1""qty""part2
}
END{
}

Expected outcome is;

126731E             0000000033220422    000003312634460-0003-1
134180D             0000000052220102    000006012622399-0003-1
 134307-004K         0000000016220420    000001612635621-0003-0
 141014-001B         0000000040220419    000004012632585-0003-0
134886C             0000000034220102    000007612620821-0003-1
 123899B             0000000050220412    000005012635007-0003-1
121543C             0000000059220102    000007512621925-0003-1
 118238C             0000000070220419    000007012632584-0003-0
121852A             0000000122220102    000013512622569-0003-1
 123124A             0000000141220419    000014112631954-0003-0
 123157C             0000000344220422    000034412634707-0003-1
fletching
  • 31
  • 4

3 Answers3

0

Using GNU awk for the 3rd arg to match():

$ cat tst.awk
BEGIN { rep = (rep == "" ? "220102" : rep) }
match($0,/((\s*\S+\s+\S{3})(\S{7}))(\S+)((\s+)(\S{7})(.*))/,a) {
    if ( a[3] != a[7] ) {
        $0 = a[1] rep a[5]
    }
}
{ print }

$ awk -f tst.awk file
126731E             0000000033220422    000003312634460-0003-1
134180D             0000000052220102    000006012622399-0003-1
 134307-004K         0000000016220420    000001612635621-0003-0
 141014-001B         0000000040220419    000004012632585-0003-0
134886C             0000000034220102    000007612620821-0003-1
 123899B             0000000050220412    000005012635007-0003-1
121543C             0000000059220102    000007512621925-0003-1
 118238C             0000000070220419    000007012632584-0003-0
121852A             0000000122220102    000013512622569-0003-1
 123124A             0000000141220419    000014112631954-0003-0
 123157C             0000000344220422    000034412634707-0003-1

$ awk -v rep='111111' -f tst.awk file
126731E             0000000033220422    000003312634460-0003-1
134180D             0000000052111111    000006012622399-0003-1
 134307-004K         0000000016220420    000001612635621-0003-0
 141014-001B         0000000040220419    000004012632585-0003-0
134886C             0000000034111111    000007612620821-0003-1
 123899B             0000000050220412    000005012635007-0003-1
121543C             0000000059111111    000007512621925-0003-1
 118238C             0000000070220419    000007012632584-0003-0
121852A             0000000122111111    000013512622569-0003-1
 123124A             0000000141220419    000014112631954-0003-0
 123157C             0000000344220422    000034412634707-0003-1
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

Using sed

$ cat script.sed
/[^ ]* \+0\+\([1-9]\)\([0-9]\{1,3\}\)\([0-9]\{6\}\).* 0\+\1\?\2/!{ 
    s/[0-9]\{6\} /220102 /
}
$ sed -f script.sed input_file
126731E             0000000033220422    000003312634460-0003-1
134180D             0000000052220102    000006012622399-0003-1
 134307-004K         0000000016220420    000001612635621-0003-0
 141014-001B         0000000040220419    000004012632585-0003-0
134886C             0000000034220102    000007612620821-0003-1
 123899B             0000000050220412    000005012635007-0003-1
121543C             0000000059220102    000007512621925-0003-1
 118238C             0000000070220419    000007012632584-0003-0
121852A             0000000122220102    000013512622569-0003-1
 123124A             0000000141220419    000014112631954-0003-0
 123157C             0000000344220422    000034412634707-0003-1
HatLess
  • 10,622
  • 5
  • 14
  • 32
  • 1
    I will need to try this on a different pc with a korn shell and I will get back to you and let you know. Thanks @HatLess – fletching Jun 22 '22 at 20:44
  • Which shell you are using has no effect on the operation of `sed`. (If you were using some weird shell with different quoting semantics, that could affect this answer; but `ksh` is completely upward compatible with the Bourne shell. Saving the script in a separate file completely bypasses the need to quote anything, anyway, though you might want to avoid that actually.) – tripleee Jul 16 '22 at 06:50
0
  • Assuming the initial zeroes in second column stays at 7 or 8 consecutively and doesn't increase or decrease

cat script

{
    c2=$2
    c3=$3
    sub("0+","",c2)
    sub("0+","",c3)
    sub("-.*","",c3)
    if (length(c2) == 8) {
        c2_value=substr(c2,1,2)
    } else if (length(c2) == 9) {
        c2_value=substr(c2,1,3)
    }

    if (length(c3) == 10) {
        c3_value=substr(c3,1,2)
    } else if (length(c3) == 11) {
        c3_value=substr(c3,1,3)
    }

    if(c2_value != c3_value) {
        sub("[1-9].*$","",$2)
        date="220102"
        print $1"             "$2 c2_value date"   "$3
    } else {
        print $0
    }
}

cat data

126731E             0000000033220422    000003312634460-0003-1
134180D             0000000052220419    000006012622399-0003-1
 134307-004K         0000000016220420    000001612635621-0003-0
 141014-001B         0000000040220419    000004012632585-0003-0
134886C             0000000034220419    000007612620821-0003-1
 123899B             0000000050220412    000005012635007-0003-1
121543C             0000000059220419    000007512621925-0003-1
 118238C             0000000070220419    000007012632584-0003-0
121852A             0000000122220419    000013512622569-0003-1
 123124A             0000000141220419    000014112631954-0003-0
 123157C             0000000344220422    000034412634707-0003-1

awk -f script data

126731E             0000000033220422    000003312634460-0003-1
134180D             0000000052220102   000006012622399-0003-1
 134307-004K         0000000016220420    000001612635621-0003-0
 141014-001B         0000000040220419    000004012632585-0003-0
134886C             0000000034220102   000007612620821-0003-1
 123899B             0000000050220412    000005012635007-0003-1
121543C             0000000059220102   000007512621925-0003-1
 118238C             0000000070220419    000007012632584-0003-0
121852A             0000000122220102   000013512622569-0003-1
 123124A             0000000141220419    000014112631954-0003-0
 123157C             0000000344220422    000034412634707-0003-1

Just Khaithang
  • 1,217
  • 1
  • 14
  • 24
  • Now asked here: https://stackoverflow.com/questions/72995611/need-to-retain-column-spacing-in-awk-script – tripleee Jul 16 '22 at 06:48
  • @ Just Khaithang the above works wonderfully except in the few lines where there are 9 zeros like this one and I don't need to change it. 151249-001A 0000000005220422 000000512634524-0003-1 There could be anywhere from 7 to 9 zeros in a line and it is changing some I don't need and seems to add an extra zero. This line below. 167991A 0000000005220419 000000512622981-0003-1 gets changed to; 167991A 00000000039220909 000000512622981-0003-1 That is 1 0 too many and didn't need changing. I am trying some edits with no luck. – fletching Jul 18 '22 at 15:36