Side comment about constructs in the OP question: Using c[$1]
to add collect elements, and the condition NR == FNR
:
While awk
info
document state that referencing an array element will set it's value to null. However, this behavior is not well known, and is not found in any other major programming modern programming language, and is not clearly mentioned in the man awk
, which is usually the first point to look for information. Awk info page: http://kirste.userpage.fu-berlin.de/chemnet/use/info/gawk/gawk_12.html#SEC114 look for If you refer to an array element that has no recorded value ...
The info page states: (In some cases, this is unfortunate, because it might waste memory inside awk.)
. It is easy to see how multiple non-awk expert can write a script that will need lot of memory just by 'checking' array entries. Many developer (beginners, and experts) will fail to recognize this pattern, as they are not aware of the side effect. They will assume, from Java/C++/C# that using a.get(k)
will NOT modify a.
The other construct is 'FNR == NR', which is used as a synonym to "am I reading the first file". For the occasional developer, this is not obvious. Easier to tag different input file with TAG=... in the command line.
My advise is to avoid this construct, and use slightly longer, but much easier to read code, borrowing on some ideas for other answers:
awk -F"," '
# Map File
!DATA_TAG { a[$1]=$2; next}
# Main file
($2 in a) {print $1}
' "$map/SC_mapping.csv" DATA_TAG=MAIN "$userdir/file2" > "Impacted_SC.csv"
# Single line, more compact
awk -F"," '!DATA { a[$1]=$2; next} ($2 in a) {print $1}' "$map/SC_mapping.csv" DATA=1 "$userdir/file2" > "Impacted_SC.csv"