Update specific lines and columns with values from another reference file

Question

This is a follow-up question on my previous thread ( Update the second line after the matched row with reference values) with more advanced requirement. I have a master file main that I wish to modify with 2 objectives: (1) to find the MATCH LINE phrase in main, jump 2 lines down and replace the 3rd column with values given second column of the reffile ; (2) if a line has write output phrase, replace its 4th column with similar replacement. So ref has 2 columns: first one for the output filenames and 2nd one for the replacement values. Please see below for the sample and desired output.

main file

one line here
This is the 'MATCH LINE'
# this is just a comment
Now this *** to be updated
write output label ***
another line here

ref file

Out1 ONE
Out2 TWO
Out3 THREE

desired outputs file1 (Out1)

one line here
This is the 'MATCH LINE'
# this is just a comment
Now this ONE to be updated
write output label ONE
another line here

desired outputs file1 (Out2)

one line here
This is the 'MATCH LINE'
# this is just a comment
Now this TWO to be updated
write output label TWO
another line here

desired outputs file1 (Out3)

one line here
This is the 'MATCH LINE'
# this is just a comment
Now this THREE to be updated
write output label THREE
another line here

My script is originated from Ed Morton @ed-morton who kindly helped me for the previous thread and I have modified it to adopt to new requirement but it gives me error. I appreciate your help.

#!/bin/awk -f

NR == FNR {
    lines[++numLines] = $0
    a[NR]=$2
    if ( /\047MATCH LINE\047/ ) {
        tgt1 = NR + 2
    }
    if ( /write output/ ) {
        tgt2 = NR
    }
    next
}
{
    for ( lineNr=1; lineNr<=numLines; lineNr++ ) {
        line = lines[lineNr]
        if ( lineNr == tgt1 ) {
            #sub(/NUMBER/,$2,line)
            line[$3]=a[FNR]
        }
        if ( lineNr == tgt2 ) {
            line[$4]=a[FNR]
        }
        print line > $1
    }
    close($1)
}

./tst.awk main ref

Error:

scalar "line" cannot be used as array

Ed suggested to split the line into array, replace the right index and stitch them together; but the output looks weird. Here is the updated script and output.

#!/bin/awk -f

NR == FNR {
    lines[++numLines] = $0
    a[NR]=$2
    if ( /\047MATCH LINE\047/ ) {
        tgt1 = NR + 2
    }
    if ( /write output/ ) {
        tgt2 = NR
    }
    next
}
{
    for ( lineNr=1; lineNr<=numLines; lineNr++ ) {
        line = lines[lineNr]
        if ( lineNr == tgt1 ) {
            #sub(/NUMBER/,$2,line)
            numFlds = split(line,flds)
            flds[3] = a[FNR]
            for ( fldNr=1; fldNr<=numFlds; fldNr++ ) {
                line = (fldNr==1 ? "" : line " ") flds[fldNr]
            }
        }
        if ( lineNr == tgt2 ) {
            numFlds = split(line,flds)
            flds[4] = a[FNR]
            for ( fldNr=1; fldNr<=numFlds; fldNr++ ) {
                line = (fldNr==1 ? "" : line " ") flds[fldNr]
            }
        }
        print line > $1
    }
    close($1)
}

Output

$ head Out*
==> Out1 <==
one line here
This is the 'MATCH LINE'
# this is just a comment
Now this line to be updated
write output label line
another line here

==> Out2 <==
one line here
This is the 'MATCH LINE'
# this is just a comment
Now this is to be updated
write output label is
another line here

fwiw, re: the error ... `line = lines[lineNr]` says to treat `line` as a scalar variable ... `line[$4]=a[FNR]` says to treat `line` as an array; an `awk` variable cannot be both a scalar and an array hence the error — markp-fuso, Jun 25 '23 at 13:06
Whenever creating an example to demonstrate your problem, include regexp chars and substrings in the text to match (e.g. `the`, `there`, `t.e`) and backreferences in the string to be replaced (e.g. `&`, `\1`, `\\1`) and typical delimiters in both (e.g. `/`, `|`, `:`, `#`) if any of those can occur in your real data. Otherwise you're likely to get solutions that will only work for sunny day cases and fail later with your real data. See [how-do-i-find-the-text-that-matches-a-pattern](https://stackoverflow.com/questions/65621325/how-do-i-find-the-text-that-matches-a-pattern) for more info. — Ed Morton, Jun 25 '23 at 20:24
Almost exact duplicate? [Update the second line after the matched row with reference values](https://stackoverflow.com/questions/76444752/update-the-second-line-after-the-matched-row-with-reference-values) — Kaz, Jun 25 '23 at 21:32

ufopilot · Answer 1 · 2023-06-25T12:46:41.777

1

#!/bin/awk -f

function join(array, start, end, sep,    result, i)
{
    if (sep == "")
       sep = " "
    else if (sep == SUBSEP) # magic value
       sep = ""
    result = array[start]
    for (i = start + 1; i <= end; i++)
        result = result sep array[i]
    return result
}
/\047MATCH LINE\047/{
    mline = NR+2
}
FNR==NR{
    main[NR] = $0
    next 
}
{
    out = $1
    for (i=1; i<=length(main); i++){
        if(i == mline){
           n=split(main[i], a, " ") 
           a[3]=$2
           print join(a, 1, n) > out
        }else if (i == mline+1 && main[i] ~ /write output label .*/) {
            n=split(main[i], a, " ") 
            a[4]=$2
            print join(a, 1, n) > out
        }else{
            print main[i] > out
        }
    }
    close(out)
}

./tst.awk main ref

$ head Out*
==> Out1 <==
one line here
This is the 'MATCH LINE'  
# this is just a comment  
Now this ONE to be updated
write output label ONE    
another line here

==> Out2 <==
one line here
This is the 'MATCH LINE'
# this is just a comment
Now this TWO to be updated
write output label TWO
another line here
    
==> Out3 <==
one line here
This is the 'MATCH LINE'
# this is just a comment
Now this THREE to be updated
write output label THREE
another line here

edited Jun 25 '23 at 12:46

answered Jun 25 '23 at 09:53

ufopilot

3,269
2
10
12

Thanks ufopilot! This works as needed but my output lines are printed in reverse (from end to beginning). How can I fix that? – EverLearner Jun 25 '23 at 12:21
Also, can you please edit the script such that it prints the output to three files (as per the 1st columns of the ```ref``` Out1, Out2 and Out3) instead of echoing them? – EverLearner Jun 25 '23 at 12:23
@EverLearner updated – ufopilot Jun 25 '23 at 12:46
strangely, your new script does not generate any out file in Korn shell. ```$ head Out* head: out: Access is denied.``` – EverLearner Jun 26 '23 at 00:23
1

That error message is saying you have a file or directory named `out` that you don't have access to when running `head`. It has nothing to do with the awk script outputfile files named `Out1..3`. – Ed Morton Jun 26 '23 at 10:45

markp-fuso · Answer 2 · 2023-06-25T21:40:29.847

One awk idea:

awk '
NR == FNR {
    if ( /MATCH LINE/              ) tgt = FNR + 2
    if ( FNR == tgt                ) $3  = "REPLACE_ME"       # replace 4th field with a string that you know does not exist in main
    if ( tgt > 0 && /write output/ ) $4  = "REPLACE_ME"       # replace 3rd field with the same dummy replacement string

    template = template (template != "" ? ORS : "") $0        # add current line to our template block of text
    next
}
{ if ( tgt > 0 ) {                                            # if "MATCH LINE" exists then ...
     template_copy = template                                 # copy template
     gsub(/REPLACE_ME/,$2,template_copy)                      # perform replacements against "template_copy"
     print template_copy > $1                                 # print "template_copy" to output file "$1"
     close($1)                                                # close file descriptor
  }
}
' main ref

NOTES:

if MATCH LINE does not exist in main then no output files will be generated
if main could contain the string REPLACE_ME then modify the code to use a string that you know won't exist in main
if fields (in main) are separated by something other than a single space (eg, tabs, multiple spaces) this solution will not maintain the original spacing (ie, tabs and multi-spaces will be replaced with a single space); maintaining original spacing is doable but requires more code

This geneartes:

$ head Out*
==> Out1 <==
one line here
This is the 'MATCH LINE'
# this is just a comment
Now this ONE to be updated
write output label ONE
another line here

==> Out2 <==
one line here
This is the 'MATCH LINE'
# this is just a comment
Now this TWO to be updated
write output label TWO
another line here

==> Out3 <==
one line here
This is the 'MATCH LINE'
# this is just a comment
Now this THREE to be updated
write output label THREE
another line here

Thank you. I am getting an error when running it on Korn shell: ```C:/Program Files (x86)/MKS Toolkit/mksnt/awk.exe: Syntax error Context is: >>> U: <<< ``` — EverLearner, Jun 26 '23 at 00:26
what's the output from `awk --version`? I'm guessing MKS toolikit has an old/oddball version of `awk`; fwiw, I replaced MKS with cygwin ... eons ago ... cygwin isn't perfect but it tends to be 'closer' to what's available in newer unix/linux envs — markp-fuso, Jun 26 '23 at 11:28
It is unknown command ! ```$awk --version Unknown option "--version" Usage: awk [-f scriptfile] [-Fc] [-v var=val] [script] [var=val ...] [file ...]``` — EverLearner, Jun 26 '23 at 11:42
assuming you made *no* modifications to the code ... not sure what your `awk` is complaining about; I'm not using MKS so I can't offer much re: troublehooting tips ... — markp-fuso, Jun 26 '23 at 12:07

Ed Morton · Accepted Answer · 2023-06-26T13:30:57.753

In any POSIX awk, change:

line[$3]=a[FNR]

to:

match(line,/^[[:space:]]*([^[:space:]]+[[:space:]]+){2}/)
tail = substr(line,RSTART+RLENGTH)
sub(/[^[:space:]]+/,"",tail)
line = substr(line,RSTART,RLENGTH) a[FNR] tail

and similarly for line[$4]=a[FNR], just change {2} to {3} in the above match().

As already mentioned in the comments, your error message was because line is a scalar (containing a string in this case) and you were trying to treat it as an array. If you want to treat line as if it were an array then you have to run split() on it first to create a new array from it's contents, then assign a value in the new array, then recombine the array back into a string to store in line.

For example, if you don't care about retaining white space (can be solved with GNU awks 4th arg to split()) you could replace the 3rd field in line by:

numFlds = split(line,flds)
flds[3] = a[FNR]
line = flds[1]
for ( fldNr=2; fldNr<=numFlds; fldNr++ ) {
    line = line " " flds[fldNr]
}

I'm using literal string replacements above instead of a *sub() so it'll work even if a[FNR] contains a backreference metachar, &.

Also, when trying to modify my previous answer to solve your current problem you had introduced a logic error when you changed

NR==FNR {...; next }
{ ...sub(/NUMBER/,$2,line) }

to:

NR==FNR { a[FNR=$2; next }
{ ...line[$3]=a[FNR]... }

instead of:

NR==FNR {...; next }
{ ...line[$3]=$2... }

What you did is completely different logic replacing part of line with a string from main instead of a string from ref. Here's the complete script for your current problem after fleshing out the common code a bit and moving it to a function:

$ cat tst.awk
NR == FNR {
    lines[++numLines] = $0
    if ( /\047MATCH LINE\047/ ) {
        tgt1 = NR + 2
    }
    if ( /write output/ ) {
        tgt2 = NR
    }
    next
}
{
    for ( lineNr=1; lineNr<=numLines; lineNr++ ) {
        line = lines[lineNr]
        if ( lineNr == tgt1 ) {
            line = rplc(line,3,$2)
        }
        if ( lineNr == tgt2 ) {
            line = rplc(line,4,$2)
        }
        print line > $1
    }
    close($1)
}

function rplc(str,tgt,val,      numFlds,flds,fldNr) {
    numFlds = split(line,flds)
    if ( tgt > numFlds ) {
        numFlds = tgt
    }
    flds[tgt] = val
    str = flds[1]
    for ( fldNr=2; fldNr<=numFlds; fldNr++ ) {
        str = str " " flds[fldNr]
    }
    return str
}

$ awk -f tst.awk main ref

$ head Out*
==> Out1 <==
one line here
This is the 'MATCH LINE'
# this is just a comment
Now this ONE to be updated
write output label ONE
another line here

==> Out2 <==
one line here
This is the 'MATCH LINE'
# this is just a comment
Now this TWO to be updated
write output label TWO
another line here

==> Out3 <==
one line here
This is the 'MATCH LINE'
# this is just a comment
Now this THREE to be updated
write output label THREE
another line here

this is a smart solution. I changed my script based on your idea, but it does not make the replacement. Please see the edit on the original statement. — EverLearner, Jun 26 '23 at 00:17
I had assumed the script you posted at the top of your question was [my answer to your previous question](https://stackoverflow.com/a/76483137/1745001) just updated to also change an additional line, but saving `a[NR]=$2` and then replacing the string with that value isn't part of my previous answer and doesn't make sense - that's saving a field from `main` and then replacing an another field from `main` with it instead of replacing with a field from `ref`. — Ed Morton, Jun 26 '23 at 09:55
I updated my answer. I think you're confused about which file is being processed in the first `NR==FNR` block vs the second block and what values `FNR` has in each. You might want to add a tracing line like `{ print FILENAME, NR, FNR, $0 | "cat>&2" }` as the first line of the script (above the `NR==FNR` line) so you can see what it's doing. — Ed Morton, Jun 26 '23 at 10:29
The script works as I needed. Thank you so much for your help and dedication. You are a true genius. — EverLearner, Jun 27 '23 at 02:25
a quick follow-up: what if the replacement should be between " " ? e.g. ```write output label "ONE" ``` . I tried ```line = rplc(line,4,\"$2\")``` in your code but it reports a syntax error. — EverLearner, Jun 27 '23 at 03:32
`line = rplc(line,4,"\""$2"\"")` or edit the function to add the double quotes if you always need them `flds[tgt] = "\"" val "\""` — Ed Morton, Jun 27 '23 at 09:44

Update specific lines and columns with values from another reference file

3 Answers3