new to this website here: i have a problem analyzing data in a csv file.
I've written a little script that reads input from a csv file and prints only the desired fields:
awk -F, -v _sourcefile=$i -v title="\"${k}\"" -v box="_${j}_" -v score="$dock_score_column" -v hbond="${xp_terms_columns[0]}" -v electro="${xp_terms_columns[1]}" -v phoben="${xp_terms_col umns[2]}" -v phobenhb="${xp_terms_columns[3]}" -v lowmw="${xp_terms_columns[4]}" -v rotpenal="${xp_terms_columns[5]}" -v lipophilicevdw="${xp_terms_columns[6]}" -v phobenpairhb="${xp_terms_columns[7]}" -v sitemap="${xp_terms_columns[8]}" -v penalties="${xp_terms_columns[9]}" -v pistack="${xp_terms_columns[10]}" -v hbpenal="${xp_terms_columns[11]}" -v expospenal="${xp_terms_columns[12]}" -v picat="${xp_terms_columns[13]}" -v clbr="${xp_terms_columns[14]}" -v zpotr="${xp_terms_columns[15]}"
'BEGIN{format = "%-8s %s %9s %9s %8s %10s %7s %10s %16s %14s %9s %11s %9s %9s %12s %7s %6s %7s\n"} $title_column ~ title && $source_column ~ _sourcefile && $source_column ~ box
{printf format, $score,"= ", $hbond, $electro, $phoben, $phobenhb, $lowmw, $rotpenal, $lipophilicevdw, $phobenpairhb, $sitemap, $penalties, $pistack, $hbpena l, $expospenal, $picat, $clbr, $zpotr}' $file
It's a complete mess but for now it does what i need to.
Question is: how can i make it simpler, by feeding it the fields stored inside {xp
terms
columns[@]}
?
The file is normal csv file, and the first part of the awk script just looks for the right records to print, my only problem is with the 16 different variables i have to declare to print.
I've tried using arrays inside awk like:
awk -F, -v _sourcefile=$i -v title="\"${k}\"" -v box="_${j}_" -v terms="$xp_terms_columns" 'BEGIN{split(terms, array, " ")} $title_column ~ title && $source_column ~ _sourcefile && $sour ce_column ~ box { n=asorti(array, sorted); for (i=1;i<=n;i++) printf " " $sorted[i] }' $file
But without success because i couldn't make asorti print the fields in the correct order.
Here is the first script above written legibly to help with this question and as an example for the OP to follow in future:
awk -F, \
-v _sourcefile="$i" \
-v title="\"${k}\"" \
-v box="_${j}_" \
-v score="$dock_score_column" \
-v hbond="${xp_terms_columns[0]}" \
-v electro="${xp_terms_columns[1]}" \
-v phoben="${xp_terms_columns[2]}" \
-v phobenhb="${xp_terms_columns[3]}" \
-v lowmw="${xp_terms_columns[4]}" \
-v rotpenal="${xp_terms_columns[5]}" \
-v lipophilicevdw="${xp_terms_columns[6]}" \
-v phobenpairhb="${xp_terms_columns[7]}" \
-v sitemap="${xp_terms_columns[8]}" \
-v penalties="${xp_terms_columns[9]}" \
-v pistack="${xp_terms_columns[10]}" \
-v hbpenal="${xp_terms_columns[11]}" \
-v expospenal="${xp_terms_columns[12]}" \
-v picat="${xp_terms_columns[13]}" \
-v clbr="${xp_terms_columns[14]}" \
-v zpotr="${xp_terms_columns[15]}" \
'
BEGIN {
format = "%-8s %s %9s %9s %8s %10s %7s %10s %16s %14s %9s %11s %9s %9s %12s %7s %6s %7s\n"
}
($title_column ~ title) && ($source_column ~ _sourcefile) && ($source_column ~ box) {
printf format, $score, "= ", $hbond, $electro, $phoben, $phobenhb, $lowmw, \
$rotpenal, $lipophilicevdw, $phobenpairhb, $sitemap, $penalties, \
$pistack, $hbpenal, $expospenal, $picat, $clbr, $zpotr
}
' "$file"