putting result of awk (multi-line variable) to another awk output

Question

I just posted a question about using grep on multi-line shell variable, but I just realized that what I needed was slightly different. grep multiline shell variable from output of executable file

What I tried to do was this: I have a grep/awk result (I'll name this as result1):

blahblah ID1 blahblah aaa
blahblah ID2 blahblah bbb
blahblah ID3 blahblah ccc
...
blahblah ID(m) blahblah mmm
blahblah ID(n) blahblah nnn

And I have another awk result from a execution output (run | awk ~~~) (I'll name this as result2):

ID1 (some sentence 1)
ID2 (some sentence 2)
ID3 (some sentence 3)
...
IDn (some sentence n)

I'm trying to get the ID1~n and the last part of result1 (aaa~nnn) from result1 and add it to result2. what I want to make looks like this:

ID1 (sentence) aaa
ID2 (sentence) bbb
...
IDn (sentence) nnn

I somehow succeeded getting

ID1 aaa
ID2 bbb

from result1, so I only have the IDn's that I have in result2, but I have no idea how to separate it and put it exactly with matching lines of result2, so I can match ID1-aaa, ID2-bbb...and so on, so I can get

ID1 (sentence) aaa
ID2 (sentence) bbb
...
IDn (sentence) nnn

something like this.

plus, those ID1 ~ IDn may not be always in order.

*"I somehow succeeded getting"* - that part needs to be in your question so we know how to help you. Please provide [A Minimal Complete Reproducable Example](https://stackoverflow.com/help/minimal-reproducible-example). — David C. Rankin, Dec 19 '22 at 06:22
@David C.Rankin I cannot post the exact codes, but I have the result 1 and 2 as separate text files, so I don't think that would matter much - Sorry about that I can't give exact status of my code — papabread, Dec 19 '22 at 06:57

tshiono · Accepted Answer · 2022-12-19T06:28:06.093

2

Assumptions:

result1 has space-separated columns and the strings aaa ... nnn are in the last columns.
IDn in result1 consists of literal string ID followed by digits.
IDn in result2 are located in the first column.

Then would you please try:

awk '
    NR==FNR {
        if (match($0, /ID[0-9]+/)) {
            id = substr($0, RSTART, RLENGTH)
            a[id] = $NF
        }
        next
    }
    {
        print $0, a[$1]
    }
' result1 result2

The NR==FNR { .. ; next} block is an idiom to be exectuted for the file only in the first argument (result1 in this case).
The function match($0, /ID[0-9]+/) returns true if a substring in the record matches a string ID followed by digits, assigining awk variables RSTART and RLENGTH to the starting position and the length of the match, individually.
substr($0, RSTART, RLENGTH) extracts the substring IDn where n is the digits.
a[id] = $NF associates the last part (e.g. aaa) to the id.
The {print $0, a[$1]} block is executed for result2 only.

If result1 is the output of command1 .. and result2 is of command2 .., you can say:

awk '
  (the same lines as above)
' <(command1 ..) <(command2 ..)

instead of specifying the filenames.

edited Dec 19 '22 at 06:28

answered Dec 19 '22 at 06:04

tshiono

21,248
2
14
22

the ID1, ID2,,, IDn was just for example - those are QIDs in my program, such as 0x10000043, etc, would it make much difference for the code above? and the ID's may not be in the same order for the result 1 and 2. – papabread Dec 19 '22 at 07:00
Thank you for the feedback. Then would you please show how we can locate the position of the IDs? If the text `blahblah` contains arbitrary number of spaces, we may not be able to use the column positions. Is there any distinctive characteristic of the IDs which we can extract the IDs with? It will be helpful if you can provide an example based on the actual file, not the over-simplified one. BTW my script does not depend on the order of IDs. Cheers. – tshiono Dec 19 '22 at 07:11
1

I figured out rest of my problems by myself, it really helped. Thank you! The QIDs just hax 0x------(digits) so I just changed ID[0-9]+ into 0x[0-9]+ – papabread Dec 19 '22 at 07:22

score 0 · Answer 2 · answered Dec 19 '22 at 05:47

Like this?

$ head f1.txt  f2.txt 
==> f1.txt <==
blahblah ID1 blahblah aaa
blahblah ID2 blahblah bbb
blahblah ID3 blahblah ccc
blahblah ID(n) blahblah nnn

==> f2.txt <==
ID1 (some sentence 1)
ID2 (some sentence 2)
ID3 (some sentence 3)
IDn (some sentence n)

$ paste -d' ' f2.txt <(awk '{print $NF}' f1.txt)
ID1 (some sentence 1) aaa
ID2 (some sentence 2) bbb
ID3 (some sentence 3) ccc
IDn (some sentence n) nnn

Note that it's really helpful if one can assume (as I have) that the line numbers (record numbers, the IDs) match up within the files.

putting result of awk (multi-line variable) to another awk output

2 Answers2