1

I have 2 file,

first file.txt
tskvdsc95
tosaocs
second file.txt
crbvdsc85;172.31.216.65&172.31.216.66;2016;tskvdsc95;172.31.240.65&172.31.240.66;3016       
crbvdsc85;172.31.216.65&172.31.216.66;2017;tskvdsc95;172.31.240.65&172.31.240.66;3017
tskvdsc195.epc.mnc009.mcc510.3gppnetwork.org;172.20.197.3;3412;tosaocs;172.20.237.70;3412       
tskvdsc195.epc.mnc009.mcc510.3gppnetwork.org;172.20.197.3;3413;tosaocs;172.20.237.69;3413

I need to query 5th column in second file, use data in first file as input for 4th column reference.

bellow is my script

#!/bin/bash
input="/path/to/folder/first file.txt.txt"
while IFS=  read -r line
do
  awk 'BEGIN{FS=";"} $4=="$line" {print$5}' /path/to/folder/second file.txt | sort | uniq -c
  #echo "$line"
done < "$input"

my script is running with empty result not as I am expected.

my expected result should be:

172.31.240.65&172.31.240.66
172.20.237.70
172.20.237.69

please help which part is wrong in above script.

thanks in advance,

WF

Renaud Pacalet
  • 25,260
  • 3
  • 34
  • 51

3 Answers3

4

Regarding the bug $4=="$line" - please read how-do-i-use-shell-variables-in-an-awk-script. But don't use a shell loop calling awk every line for this, just call awk once.

Using any awk:

$ cat tst.sh
#!/usr/bin/env bash

awk -F';' '
NR==FNR {
    first[$1]
    next
}
($4 in first) && !seen[$5]++ {
    print $5
}
' first_file.txt second_file.txt

$ ./tst.sh
172.31.240.65&172.31.240.66
172.20.237.70
172.20.237.69
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
3

You apparently want to avoid repetitions in the output. awk is probably a good choice for this job, thanks to its associative arrays, and ability to split input records in fields.

If your input format is simple (no ; in quoted fields, one record per line, etc.) you can try:

awk -F';' 'NR==FNR {a[$0];next} $4 in a {b[$5]}
  END {for(k in b) print k}' file1 file2

Declare ; as the input field separator (-F';'). While parsing first file (NR==FNR is true only for the first file) store each line as a key of array a (a[$0]) and move to next line (next). While parsing the second file, if fourth field is a key of array a ($4 in a), store fifth field in array b. At the END loop on all keys of array b (for(k in b)) and print them (print k).

Note: this avoids repetitions in the output but it does not preserve the input order. If you need to preserve the input order please edit your question and add this.

Renaud Pacalet
  • 25,260
  • 3
  • 34
  • 51
  • thanks @Renault Pacalet, that working prefect as I want. sorry for late respond, I am googling how the script work. can I ask again for further enhancement, above script is working great to avoid duplicate in column 5th, but lost of information which column 5 belong to column 4 without duplicate. – Wolverine adamantium Aug 16 '23 at 09:54
  • Not sure I understand. Do you want to also print column 4? This is different from your expected output but easy to do: replace `{b[$5]}` with `{b[$5]=$4}` and `print k` with `print b[k] ";" k`. – Renaud Pacalet Aug 16 '23 at 10:20
  • yes, work prefectly as I am expected. display column 4 and column 5 together. so basically, if not mistaken print both b[$5] as index and $4 as value ? thanks a lot for your help – Wolverine adamantium Aug 16 '23 at 10:46
  • That's it. In the answer we only use the keys and don't care about the values. Here we do use both. – Renaud Pacalet Aug 16 '23 at 11:31
0

The following script produces the expected output.

#!/bin/sh
file1=$1
file2=$2

while IFS= read -r target <&3; do
    {
        while IFS= read -r line <&4; do
            {
                column=$(echo "$line" | cut -d ";" -f 4)  
                if [ "$column" = "$target" ]; then
                    echo "$line" | cut -d ';' -f 5
                fi
            } 3<&-
        done 4< "$file2" | sort -rnu
    } 4<&-
done 3< "$file1"

I've addressed several issues that were all pointed out by Ed Morton.

cforler
  • 179
  • 1
  • 7