how to use a file in awk with one filed as input file and one as searchterm

Question

Hello World (my first question here :)

I have one file containing two fields.

I want awk to use one field as the input file for the search of the other field.
Lets say the input file looks like:

CONFIG_file123;configelement_ABC
CONFIG_file124;configelement_XYZ

Now I want awk to read field 1 (CONFIG_file123)and use it as input file where it searches for field 2 (configelement_ABC) outputs that and proceeds then to the next line in the input file.

Like this:

awk 'BEGIN{ RS = "\n"; ORS="\n" } {if ($0 ~ /configelement_ABC/) { print FILENAME ";" $0  }}' CONFIG_file123

Thank you very much in advance!

if this helps:

    CONFIG_file123
    
    configelement_ABA Data1 Data2 Data3
    configelement_ABB Data1 Data2 
    configelement_ABC Data1
    configelement_ABD Data1 Data2 Data3
    configelement_XYW Data1 Data2
    configelement_XYX Data1 Data2 Data3
    configelement_XYY Data1
    configelement_XYZ Data1 Data2 Data3
    
    CONFIG_file124
    
    configelement_ABA Data1 Data2 Data3
    configelement_ABB Data1 Data2 
    configelement_ABC Data1 Data2
    configelement_ABD Data1 Data2 Data3
    configelement_XYW Data1 Data2
    configelement_XYX Data1 Data2 Data3
    configelement_XYY Data1 Data2 
    configelement_XYZ Data1 Data2
    
    Output
    
    CONFIG_file123;configelement_ABC Data1
    CONFIG_file124;configelement_XYZ Data1 Data2

Please post input files `CONFIG_file123`, `CONFIG_file124` and the expected output. Don't post them as comments, images, tables or links to off-site services but use text and include them to your original question. Thanks. — James Brown, Aug 17 '21 at 13:47

Renaud Pacalet · Accepted Answer · 2021-08-17T14:05:43.680

1

Using awk for this, while probably doable, is a bit out of scope. You could use a bash loop, instead, and grep:

while IFS=';' read -r file string; do grep -F "$string" "$file"; done < list.txt

Of course it assumes that you do not have ; inside your filenames or search strings. But if you had some, your question would be under-specified: where would be the real separation between the two fields in lines with more than one ;?

edited Aug 17 '21 at 14:05

answered Aug 17 '21 at 12:57

Renaud Pacalet

25,260
3
34
51

This gives me the required output. Thanks – Ivan Šuker Aug 18 '21 at 07:45

Ed Morton · Answer 2 · 2021-08-18T12:55:59.323

Assuming a file name only occurs once in the input file, it can't contain a newline, and ; doesn't appear anywhere else in that file except to separate the 2 fields, then you can do this using any awk in any shell on every Unix box:

awk '
    BEGIN { FS=OFS=";" }
    NR==FNR {
        ARGV[ARGC++] = $1
        re[$1] = $2
        next
    }
    $0 ~ re[FILENAME] { print FILENAME, $0 }
' file

The above also assumes you want to do a partial regexp match on each file since that's what the code in your question does but that may not be the best way to do whatever it is you really want - see How do I find the text that matches a pattern? for the other possibilities.

Given your newly added sample input it looks like you should be doing a full string match on the first field instead of a partial regexp match across the whole line - if that's correct then change these lines:

    re[$1] = $2
    $0 ~ re[FILENAME] { print FILENAME, $0 }

to:

    str[$1] = $2
    $1 == str[FILENAME] { print FILENAME, $0 }

The differences between this approach and the other 2 current answers are:

The shell loop calling grep will be orders of magnitude slower than this (see the part about performance at why-is-using-a-shell-loop-to-process-text-considered-bad-practice to understand why), and
The awk script calling getline is manually writing code to do what awk does for you automatically (i.e. read lines from a file and apply conditions/actions) so it needs more code for this and even more if you want to add anything extra to do with the files being tested, e.g. to print every line that contains X with the above awk script you just add a line containing /X/ because it's processing the files using awks normal processing mode whereas with the getline loop version you need to manually write if (/X/) print because you're bypassing awks normal processing mode. If you are considering using getline, read http://awk.freeshell.org/AllAboutGetline first.

Those other 2 approaches do have the advantage of using almost no memory while the above script has to store all of the original input files contents in memory so in the extremely unlikely case that that file of filename;regexp pairs was billions of lines long (i.e. you have billions of files on your PC to be searched) then that could be an issue (but then the shell loop one would take days or weeks to finish).

This delivers all the lines - but i am not able to add the filename to the output. — Ivan Šuker, Aug 18 '21 at 07:42
I updated my answer to print the file name as well as the matching line. — Ed Morton, Aug 18 '21 at 12:56
I like how you're pushing the config filenames into ARGV. Very clever! — Rusty Lemur, Aug 18 '21 at 17:18
You wrote:"Assuming a file name only occurs once in the input file" but as i have multiple filenames, what can be done? — Ivan Šuker, Aug 19 '21 at 11:03
Use an array instead of a scalar for the strs[FILENAME] contents and then a hash lookup instead of an equality comparison if you're using GNU awk. Having said that - [chameleon questions](https://meta.stackexchange.com/questions/43478/exit-strategies-for-chameleon-questions) are highly discouraged so please put your question back as it was when I answered it and ask a new, followup question that includes the cases you forgot to include in this one. — Ed Morton, Aug 19 '21 at 14:57

Rusty Lemur · Answer 3 · 2021-08-18T17:17:26.303

awk -F\; '{
    config_file = $1
    search_term = $2
    while ((getline < config_file) > 0) {
        if ($0 ~ search_term) {
            print $0
            break
        }
    }
}' input_file

This will process a file name input_file which should have two fields separated by a semicolon. It will take the first field in each record as the configuration file, and the second field as the term to be searched.

Uses getline to read from the config_file into $0 (it will be split as a normal record). The while loop will read every line in the config file and compare each line with the search term. If it finds the search term, it will print it and stop searching. (Remove the break statement if you want to print every line that matches.)

Sample input and config files used in testing:

$ cat input_file
CONFIG_file123;configelement_ABC Data1
CONFIG_file123;configelement_XYZ Data1 Data2 Data3
CONFIG_file124;configelement_XYZ Data1 Data2
    
$ cat CONFIG_file123
configelement_ABA Data1 Data2 Data3
configelement_ABB Data1 Data2
configelement_ABC Data1
configelement_ABD Data1 Data2 Data3
configelement_XYW Data1 Data2
configelement_XYX Data1 Data2 Data3
configelement_XYY Data1
configelement_XYZ Data1 Data2 Data3

$ cat CONFIG_file124
configelement_ABA Data1 Data2 Data3
configelement_ABB Data1 Data2
configelement_ABC Data1 Data2
configelement_ABD Data1 Data2 Data3
configelement_XYW Data1 Data2
configelement_XYX Data1 Data2 Data3
configelement_XYY Data1 Data2
configelement_XYZ Data1 Data2

Output:

configelement_ABC Data1
configelement_XYZ Data1 Data2 Data3
configelement_XYZ Data1 Data2

So it looks like i did something wrong yesterday :/ Thank you very much for your effort Rusty Lemur! — Ivan Šuker, Aug 19 '21 at 07:51
Again i am unable to add the print statement for the config file from where which line is to the output — Ivan Šuker, Aug 19 '21 at 11:18
If you want to add the filename in the output, you can change the print statement to: `print config_file ":" $0`. Of course, change the `":"` to any separator you want. — Rusty Lemur, Aug 20 '21 at 14:47

how to use a file in awk with one filed as input file and one as searchterm

3 Answers3