I've seen a couple of your questions and you seem a bit confused on how to process multiple files. While you can use getline()
to read information that is outside the records of the current file, when processing 2 files for coordinating information, it is rarely needed.
Instead you will order the processing of the files so you capture what you need from the first file, either in an array (normally) or by concatenating information in a string (provides benefit in some cases), and then reading records from the second file and applying whatever changes are needed. You correctly use FNR==NR
to check the current file record number is equal the total number of records to identify when reading from the first file provided. But then your script kind of meanders away from what you want.
In your ref
file, all you really care about is the second field. Just read that into an array skipping the remainder of the rules, e.g.
awk 'FNR==NR {a[++n]=$2; next} ... '
(note: by using pre-increment for the array index (++n), you keep the index consistent with the 1-based processing of the record number, etc...)
Now all you need is one more rule (actually two -- considering the use of the default print
rule), e.g.
' ... $1 && ++line==3 && i<=n {sub(/NUMBER/,a[++i]); line=0}1'
Now let's go through the logic processing the main
file. The first thing you need is a simple variable line
to track 1
, 2
, 3
(replace), reset to 0
. So if you look at the conditional $1 && ++line==3 && i<=n
it says:
- if there is a first field (e.g. not just an empty line); then
- pre-increment line and compare it to
3
; and finally
- make sure you haven't run out of saved replacement numbers.
(note: since it is an AND comparison, on the first false criteria, the remainder are never checked preventing ++line
from executing on blank lines)
If all three conditions are met, then you just substitute the number saved as a[++i]
for /NUMBER/
using sub()
. The 1
at the end of the rule is just shorthand for the default rule print
.
Example Use/Output
With your ref
file in dat/ref
(does not contain [ref]
which I take as your way of giving a filename) and longer main
in dat/main
, e.g.
$ cat dat/main
This is the 'MATCH LINE'
# this is just a comment
This NUMBER to be updated
This is the 'MATCH LINE'
# this is just a comment
This NUMBER to be updated
Then you would use the full awk
expression as:
$ awk 'FNR==NR {a[++n]=$2; next} $1 && ++line==3 && i<=n {sub(/NUMBER/,a[++i]); line=0}1' dat/ref dat/main
This is the 'MATCH LINE'
# this is just a comment
This ONE to be updated
This is the 'MATCH LINE'
# this is just a comment
This TWO to be updated
Which is the output you specify -- but I suspect you actually need a bit more to handle other lines that may be in your file....
If You Have Additional Lines In The [main] File
If your [main]
file can have all kinds of other lines in it, then you need to track whether you have found a matched line and are in your 1
, 2
, 3
count. You can do that with small changes using the line
variable as a flag and counter like:
awk 'FNR==NR {a[++n]=$2; next} line && ++line==3 && i<=n {sub(/NUMBER/,a[++i]); line=0} /MATCH LINE/ {line = 1}1' dat/ref dat/main
Here we are using line
as a flag and a counter that is set to 1
(true
) if you find a line with "MATCH LINE"
in it. You toggle the line
flag off when you make your replacement. That way any other lines that may come along are simply printed unchanged. For example lets say your [main]
file now contains:
$ cat dat/main
@#$#% stuff
more stuff
###whatever
This is the 'MATCH LINE'
# this is just a comment
This NUMBER to be updated
@#$$%
This is the 'MATCH LINE'
# this is just a comment
This NUMBER to be updated
Now you simply make the replacements on the 2nd line after the "MATCH LINE"
, e.g.
$ awk 'FNR==NR {a[++n]=$2; next} line && ++line==3 && i<=n {sub(/NUMBER/,a[++i]); line=0} /MATCH LINE/ {line = 1}1' dat/ref dat/main
@#$#% stuff
more stuff
###whatever
This is the 'MATCH LINE'
# this is just a comment
This ONE to be updated
@#$$%
This is the 'MATCH LINE'
# this is just a comment
This TWO to be updated
Which again is what you specify as wanting for output shown with two replacements, but done in a file that can have all kinds of other lines (like normal files do).
Let me know if you have questions, or if you actually have the lines [ref]
and [main]
in your input files.
Update For Your Edit To Redirect Single Replacement Line to Filename $1
in [ref]
Okay, per your comment:
@DavidC.Rankin: I understood your thought. You assumed I have several
'MATCH LINE' in dat/main file. I should emphasize that dat/main file
has ONLY one 'MATCH LINE' that we use an anchor to modify the
following second line and replace the NUMBER with One/Two and Three
and output each file to separate files, named per the first column of
dat/ref (i.e. Out1, Out2 & Out3).
To redirect to a file named by the 1st field in [ref]
(my dat/ref
) all you need to do is save the first field in a separate array (or you can just save the complete line and split it later). Let's use a filename array using the same n
as the index, e.g. fn[++n]
(we increment with fn[]
now since we place it first -- you can order it however you like). The only change needed to save the first field in dat/ref
in the filename array is:
... 'FNR==NR {fn[++n]=$1; a[n]=$2; next} ...
With the first file saved, we no longer want to use the default print
command to output each records in the 1
, 2
, 3
count, we only want to output the substituted line to the new file. So remove the 1
from the end and now just redirect after setting line=0;
, e.g.
... line && ++line==3 && i<=n {sub(/NUMBER/,a[++i]); line=0; print > fn[i]} ...
(for me I like to create output files in the dat/
directory, so you can simply let awk
concatenate the directory for you, e.g.
... print > ("dat/" fn[i]) ...
(Thank you Ed for pointing the precedence issue out to make it work with all awks)
Putting it altogether and outputting the new files to the dat/
directory, you would have:
awk 'FNR==NR {fn[++n]=$1; a[n]=$2; next} line && ++line==3 && i<=n {sub(/NUMBER/,a[++i]); line=0; print > ("dat/" fn[i])} /MATCH LINE/ {line = 1}' dat/ref dat/main
New Files Created
Since I only have 2 sets of statements in dat/main
, I get just two new files, e.g.
$ ls -al dat/out*
-rw-r--r-- 1 david david 23 Jun 12 23:20 dat/out1
-rw-r--r-- 1 david david 23 Jun 12 23:20 dat/out2
And the content is as you specify, a single line containing the substitution, e.g.
$ cat dat/out1
This ONE to be updated
and
$ cat dat/out2
This TWO to be updated
Let me know if we have finally has a "meeting-of-the-minds" and communicated and understood what you are wanting to accomplish.
(note: as Ed mentions, if there is nothing special about the spacing in the output, e.g. no special number of whitespace you are needing to preserve, and if NUMBER
is always the 2nd field, you can simply set $2 = a[++i]
instead of using sub(/NUMBER/,a[++i])
.)
Additional Edit to Now Write All Lines To Output file (MATCH
through substition)
If you now want to write the MATCH line, the next line and the changed line to the file specified by the 1st field in ref
, you can add a variable (say content
) to accumulate each of the lines and then print redirecting them to the file, e.g.
awk 'FNR==NR {fn[++n]=$1; a[n]=$2; next} line && ++line==3 && i<=n {sub(/NUMBER/,a[++i]); line=0; content = content $0; print content > ("dat/" fn[i]); content = ""} /MATCH LINE/ {line = 1} line > 0 {content = content $0 "\n"}' dat/ref dat/main
Files Created
Same dat/out1
and dat/out2
, but now with the content:
$ cat dat/out1
This is the 'MATCH LINE'
# this is just a comment
This ONE to be updated
and
$ cat dat/out2
This is the 'MATCH LINE'
# this is just a comment
This TWO to be updated
When the command line gets this long, it's easier just to create an awk
script, say matchline.awk
and make it executable with chmod +x matchline.awk
. Now you can read the script a lot easier and all you need to do to run it is, e.g. with my dat/ref
and dat/main
is:
$ ./matchline.awk dat/ref dat/main
The full awk
script would be:
#!/bin/awk -f
FNR == NR { ## process ref file
fn[++n] = $1 ## saving 1st field to array fn[]
a[n] = $2 ## savind 2nd filed to array a[]
next ## skip to next record
}
line && ++line == 3 && i <= n { ## if line set and 3 and array element reamin
sub(/NUMBER/,a[++i]) ## substitute 2nd field for NUMBER
content = content $0 ## append
print content > ("dat/" fn[i]) ## output saved content to 1st field name
line = 0 ## reset line 0
content = "" ## reset content empty
}
/MATCH LINE/ { ## if line has 'MATCH LINE'
line = 1 ## set line to 1
}
line > 0 { ## if line set
content = content $0 "\n" ## append to contents with newline
}
(same files created)
Per your last comment:
@DavidC.Rankin: I tried last edit (the long awk script) with only
change of removing "dat/" from the print print content > (fn[i]) , but
it only generated the first output (Out1)
If not using a separate directory for the input/output files (e.g. "dat/"
), then there is no need to enclose the filename concatenation in ( ... )
. All you need is print content > fn[i]
. The whole script with "dat/"
removed would be:
#!/bin/awk -f
FNR == NR { ## process ref file
fn[++n] = $1 ## saving 1st field to array fn[]
a[n] = $2 ## savind 2nd filed to array a[]
next ## skip to next record
}
line && ++line == 3 && i <= n { ## if line set and 3 and array element reamin
sub (/NUMBER/, a[++i]) ## substitute 2nd filed for NUMBER
content = content $0 ## append
print content > fn[i] ## output saved content
line = 0 ## reset line 0
content = "" ## reset content empty
}
/MATCH LINE/ { ## if line has 'MATCH LINE'
line = 1 ## set line to 1
}
line > 0 { ## if line set
content = content $0 "\n" ## append to contents with newline
}
Files Created in Present Directory
Now with the two sets of input in main
shown above and your ref
, you run in the same way:
$ ./matchline.awk ref main
Before running the script, the only files present in the directory were:
$ ls -al
total 20
drwxr-xr-x 2 david david 4096 Jun 15 16:51 .
drwxr-xr-x 10 david david 4096 Jun 15 16:50 ..
-rw-r--r-- 1 david david 153 Jun 15 16:50 main
-rwxr-xr-x 1 david david 1009 Jun 15 16:51 matchline.awk
-rw-r--r-- 1 david david 29 Jun 15 16:50 ref
After (with contents shown), you have
$ 16:51 wizard:~/tmp/awk/tst> ./matchline.awk ref main
16:51 wizard:~/tmp/awk/tst> l
total 28
drwxr-xr-x 2 david david 4096 Jun 15 16:51 .
drwxr-xr-x 10 david david 4096 Jun 15 16:50 ..
-rw-r--r-- 1 david david 153 Jun 15 16:50 main
-rwxr-xr-x 1 david david 1009 Jun 15 16:51 matchline.awk
-rw-r--r-- 1 david david 73 Jun 15 16:51 out1
-rw-r--r-- 1 david david 73 Jun 15 16:51 out2
-rw-r--r-- 1 david david 29 Jun 15 16:50 ref
16:51 wizard:~/tmp/awk/tst> cat out1
This is the 'MATCH LINE'
# this is just a comment
This ONE to be updated
16:51 wizard:~/tmp/awk/tst> cat out2
This is the 'MATCH LINE'
# this is just a comment
This TWO to be updated
Which is exactly what you have described as wanting. Remove the (...)
in the redirection statement and give it another go. Also show me the exact output of awk --version
on your system if you are still having issues. I've run into some strange awk
issues where the OP was using a 20 year old awk
on a Sun Sparkstation --so we need to eliminate that issue if you still have problems.
Changes to Generalize 2nd Field Replacement in main
To generalize the replacement of the 2nd field in main
instead of replacing NUMBER
, you can simply change the sub()
command to an assignment to the 2nd field. The rule with the changes is shown below, the rest of the script is unchanged. You are just replacing the sub()
line with an assignment to $2
as Ed mentioned in a comment earlier, e.g.
line && ++line == 3 && i <= n { ## if line set and 3 and array element reamin
# sub (/NUMBER/, a[++i]) ## substitute 2nd field for NUMBER
$2 = a[++i]; ## assign save 2nd field to replace 2nd field
content = content $0 ## append
print content > fn[i] ## output saved content
line = 0 ## reset line 0
content = "" ## reset content empty
}
(original sub()
command commented and new assignment below it)
Results are the same as the last example above.
Let me know if you have questions.