-1

SC_mapping.csv:

2,4
3,6

file2:

71,2
72,2
71,4

Output:

71
72

Program:

#!/bin/bash

read -p "Enter the dump directory path: " PATH
read -p "Mapping path: " Mapping
echo $PATH
echo $Mapping
if [ -s $Mapping/SC_mapping.csv ]; then
echo
 awk -F"," 'NR==FNR{c[$1];next} {if($2 in c){print $1} else{}}' $Mapping/SC_mapping.csv $PATH/file2 > Impacted_SC.csv
fi
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Please do mention what is the exact error are you getting in your question with CODE TAGS. Also please do mention your efforts which you have put in order to solve your own problem in your question and let us know then. – RavinderSingh13 Nov 27 '19 at 05:34
  • 4
    This is exactly why you don't use all upper case for local variable names - you're overwriting the shells PATH variable with whatever input you get in response to your prompt. Google shell PATH variable and see https://stackoverflow.com/q/673055/1745001 and many similar posts about shell naming conventions. – Ed Morton Nov 27 '19 at 05:34
  • HI @RavinderSingh13, ./test.sh: line 9: awk: command not found, this is the error i am getting. i am not able to figure out what the issue is, since it is running fine in terminal – Sumit Kumar Gupta Nov 27 '19 at 05:38
  • @SumitKumarGupta, Could you please do mention logic of getting your samples expected Output in your post and it is not clear. – RavinderSingh13 Nov 27 '19 at 05:40
  • @SumitKumarGupta, I second Ed sir, even I am pretty sure that because of your `PATH` variable value things messed up and since variable is passed as an Input_file to `awk` which is NOT found(file by system) hence it is getting stuck on terminal, so keep your variable name a different name than PATH and check once. – RavinderSingh13 Nov 27 '19 at 05:41
  • I am trying to match column 1 in $Mapping/SC_mapping.csv file with column 2 in $PATH/file2 and if matches then i want to print column 1 from $PATH/file2 – Sumit Kumar Gupta Nov 27 '19 at 05:42
  • Thanks @EdMorton and RavinderSingh13, working now – Sumit Kumar Gupta Nov 27 '19 at 05:45
  • Hi @RavinderSingh13, can we print column 2 as well from SC_mapping.csv in output, i tried awk -F"," 'NR==FNR{c[$1];next} {if($2 in c){print $1","c[$2]} else{}}' $Mapping/SC_mapping.csv $PATH/file2, but it's not working – Sumit Kumar Gupta Nov 27 '19 at 06:17
  • @SumitKumarGupta, Sure please check my edit now, we should be good here. – RavinderSingh13 Nov 27 '19 at 06:18

2 Answers2

1

Could you please try following.

#!/bin/bash
read -p "Enter the dump directory path: " userdir
read -p "Mapping path: " map
echo $userdir
echo $map
if [[ -s $map/SC_mapping.csv ]]
then
    awk 'BEGIN{FS=","} FNR==NR{a[$1]=$2;next} ($2 in a){print $1,a[$2]}' "$map/SC_mapping.csv" "$userdir/file2" > "Impacted_SC.csv"
fi

Following are the fixes in OP's attempt:

  • Corrected variables, since PATH is a default variable it shouldn't be used.
  • Used proper awk command it doesn't look complete command in OP's attempt.
James Brown
  • 36,089
  • 7
  • 43
  • 59
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0

Side comment about constructs in the OP question: Using c[$1] to add collect elements, and the condition NR == FNR:

While awk info document state that referencing an array element will set it's value to null. However, this behavior is not well known, and is not found in any other major programming modern programming language, and is not clearly mentioned in the man awk, which is usually the first point to look for information. Awk info page: http://kirste.userpage.fu-berlin.de/chemnet/use/info/gawk/gawk_12.html#SEC114 look for If you refer to an array element that has no recorded value ...

The info page states: (In some cases, this is unfortunate, because it might waste memory inside awk.). It is easy to see how multiple non-awk expert can write a script that will need lot of memory just by 'checking' array entries. Many developer (beginners, and experts) will fail to recognize this pattern, as they are not aware of the side effect. They will assume, from Java/C++/C# that using a.get(k) will NOT modify a.

The other construct is 'FNR == NR', which is used as a synonym to "am I reading the first file". For the occasional developer, this is not obvious. Easier to tag different input file with TAG=... in the command line.

My advise is to avoid this construct, and use slightly longer, but much easier to read code, borrowing on some ideas for other answers:

awk -F"," '
  # Map File
!DATA_TAG { a[$1]=$2; next}
  # Main file
($2 in a) {print $1}
' "$map/SC_mapping.csv" DATA_TAG=MAIN "$userdir/file2" > "Impacted_SC.csv"

# Single line, more compact
awk -F"," '!DATA { a[$1]=$2; next} ($2 in a) {print $1}' "$map/SC_mapping.csv" DATA=1 "$userdir/file2" > "Impacted_SC.csv"
dash-o
  • 13,723
  • 1
  • 10
  • 37