Query 5th column value, based on 4th column input, where 4th column provided by first file.txt

Question

I have 2 file,

first file.txt
tskvdsc95
tosaocs

second file.txt
crbvdsc85;172.31.216.65&172.31.216.66;2016;tskvdsc95;172.31.240.65&172.31.240.66;3016       
crbvdsc85;172.31.216.65&172.31.216.66;2017;tskvdsc95;172.31.240.65&172.31.240.66;3017
tskvdsc195.epc.mnc009.mcc510.3gppnetwork.org;172.20.197.3;3412;tosaocs;172.20.237.70;3412       
tskvdsc195.epc.mnc009.mcc510.3gppnetwork.org;172.20.197.3;3413;tosaocs;172.20.237.69;3413

I need to query 5th column in second file, use data in first file as input for 4th column reference.

bellow is my script

#!/bin/bash
input="/path/to/folder/first file.txt.txt"
while IFS=  read -r line
do
  awk 'BEGIN{FS=";"} $4=="$line" {print$5}' /path/to/folder/second file.txt | sort | uniq -c
  #echo "$line"
done < "$input"

my script is running with empty result not as I am expected.

my expected result should be:

172.31.240.65&172.31.240.66
172.20.237.70
172.20.237.69

please help which part is wrong in above script.

thanks in advance,

WF

Please edit your question and show the expected output for the given input. — Renaud Pacalet, Aug 14 '23 at 09:24
Your `.txt` file's name has a space in it; it must be quoted. — Kaz, Aug 14 '23 at 19:53
... and don't create files with spaces in their names, it just makes it far more likely you'll trip over a bug in your or someone else's code some day. — Ed Morton, Aug 15 '23 at 00:27

score 4 · Answer 1 · answered Aug 14 '23 at 12:44

Regarding the bug $4=="$line" - please read how-do-i-use-shell-variables-in-an-awk-script. But don't use a shell loop calling awk every line for this, just call awk once.

Using any awk:

$ cat tst.sh
#!/usr/bin/env bash

awk -F';' '
NR==FNR {
    first[$1]
    next
}
($4 in first) && !seen[$5]++ {
    print $5
}
' first_file.txt second_file.txt

$ ./tst.sh
172.31.240.65&172.31.240.66
172.20.237.70
172.20.237.69

Renaud Pacalet · Answer 2 · 2023-08-14T09:57:11.847

3

You apparently want to avoid repetitions in the output. awk is probably a good choice for this job, thanks to its associative arrays, and ability to split input records in fields.

If your input format is simple (no ; in quoted fields, one record per line, etc.) you can try:

awk -F';' 'NR==FNR {a[$0];next} $4 in a {b[$5]}
  END {for(k in b) print k}' file1 file2

Declare ; as the input field separator (-F';'). While parsing first file (NR==FNR is true only for the first file) store each line as a key of array a (a[$0]) and move to next line (next). While parsing the second file, if fourth field is a key of array a ($4 in a), store fifth field in array b. At the END loop on all keys of array b (for(k in b)) and print them (print k).

Note: this avoids repetitions in the output but it does not preserve the input order. If you need to preserve the input order please edit your question and add this.

edited Aug 14 '23 at 09:57

answered Aug 14 '23 at 09:50

Renaud Pacalet

25,260
3
34
51

thanks @Renault Pacalet, that working prefect as I want. sorry for late respond, I am googling how the script work. can I ask again for further enhancement, above script is working great to avoid duplicate in column 5th, but lost of information which column 5 belong to column 4 without duplicate. – Wolverine adamantium Aug 16 '23 at 09:54
Not sure I understand. Do you want to also print column 4? This is different from your expected output but easy to do: replace `{b[$5]}` with `{b[$5]=$4}` and `print k` with `print b[k] ";" k`. – Renaud Pacalet Aug 16 '23 at 10:20
yes, work prefectly as I am expected. display column 4 and column 5 together. so basically, if not mistaken print both b[$5] as index and $4 as value ? thanks a lot for your help – Wolverine adamantium Aug 16 '23 at 10:46
That's it. In the answer we only use the keys and don't care about the values. Here we do use both. – Renaud Pacalet Aug 16 '23 at 11:31

cforler · Answer 3 · 2023-08-15T06:55:09.450

The following script produces the expected output.

#!/bin/sh
file1=$1
file2=$2

while IFS= read -r target <&3; do
    {
        while IFS= read -r line <&4; do
            {
                column=$(echo "$line" | cut -d ";" -f 4)  
                if [ "$column" = "$target" ]; then
                    echo "$line" | cut -d ';' -f 5
                fi
            } 3<&-
        done 4< "$file2" | sort -rnu
    } 4<&-
done 3< "$file1"

I've addressed several issues that were all pointed out by Ed Morton.

Got rid of all warnings from https://www.shellcheck.net/.
Replaced upper case names with lower case names, as recommended by the article correct-bash-and-shell-script-variable-capitalization.
Improved reliability as pointed out by the article why-is-using-a-shell-loop-to-process-text-considered-bad-practice
Fixed the partial regex match issue.

The only remaining issue AFAIK is it'll be orders of magnitude slower and obviously requires more code and more complex code than an awk script but there's nothing you can do about that. — Ed Morton, Aug 17 '23 at 17:01

Query 5th column value, based on 4th column input, where 4th column provided by first file.txt

3 Answers3