I have a TSV file and I am trying to perform some analysis. I have a month column and the data is numbers 1-12 which corresponds to the month (Jan=1, Feb=2 etc). I am trying to setup a counter so that every time the code reads a 1 in the 6th column, it adds to Jan count, every 2 it reads, +1 to Feb count etc. Once all of the data has been iterated through. I need to find the median number of each month. I have some echo statements placed to troubleshoot. Here is my code and what it outputs:
breaches_per_month(){
input_file=$1
original_data=$(cat "input_file")
#Setup of month array
months=("Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec")
# Read the input file line by line and increment the count for the respective month
#Initialise count variables for each month
declare -A month_counts
for month in "${months[@]}"; do
month_counts[$month]=0
done
# Read column 6 and increment count for corresponding month
while IFS=$'\t' read -r _ _ _ _ _ month _; do
case $month in
1) month_counts["Jan"]=$((month_counts["Jan"] + 1));;
2) month_counts["Deb"]=$((month_counts["Feb"] + 1));;
3) month_counts["Mar"]=$((month_counts["Mar"] + 1));;
4) month_counts["Apr"]=$((month_counts["Apr"] + 1));;
5) month_counts["May"]=$((month_counts["May"] + 1));;
6) month_counts["Jun"]=$((month_counts["Jun"] + 1));;
7) month_counts["Jul"]=$((month_counts["Jul"] + 1));;
8) month_counts["Aug"]=$((month_counts["Aug"] + 1));;
9) month_counts["Sep"]=$((month_counts["Sep"] + 1));;
10) month_counts["Oct"]=$((month_counts["Oct"] + 1));;
11) month_counts["Nov"]=$((month_counts["Nov"] + 1));;
12) month_counts["Dec"]=$((month_counts["Dec"] + 1));;
esac
done < "$input_file"
# Calculate the median count
counts=("${month_counts[@]}")
median_count=$(printf '%s\n' "${counts[@]}" | sort -n | awk 'NR == int((length+1)/2) {print}')
echo "Median count: $median_count"
# Print the counts for each month
for month in "${months[@]}"; do
echo "$month: ${month_counts[$month]}"
done
}
breaches_per_month "$1"
This was my output. I have since worked on this when it was uploaded and made improvements. The data has added correctly to the count but it is off by 1 in position. So Jun should have the value of 78, not 60. I suspect this is the case for all of the others so I have a positional error:
cat: input_file: No such file or directory
Median count: 60
Jan: 86
Feb: 88
Mar: 83
Apr: 62
May: 78
Jun: 60
Jul: 65
Aug: 68
Sep: 92
Oct: 95
Nov: 93
Dec: 77