I have the following awk script:
#! /usr/bin/awk -f
BEGIN{
FS=OFS="\t";
split("Mtm|^Dept|^Im|^TT|ge[mn]$,^Mini", arr1, ",")
split("Variable Expenses,Fixed Expenses", arr2, ",")
itm="Accounting & Legal";
}
$3<0 && $4~arr1[1] {$7=itm; $8=arr2[1]; print $0}
$3<0 && $4~arr1[2] {$7=itm; $8=arr2[2]; print $0}
Which produces the following output:
c_wp_e_2025 2021-05-10 -120.00 8430Ken Jebsen BILL PAYMENT 12:08-11662 Accounting & Legal Variable Expenses
c_wp_e_2148 2021-07-27 -90.85 6450Barrie L. Jorgen BILL PAYMENT 09:31-35651 Accounting & Legal Variable Expenses
e_wp_e_789 2018-06-12 -190.00 Immigration NZ 11 DEBIT Accounting & Legal Variable Expenses
e_wp_e_818 2018-06-29 -153.43 Dept of Home Affa 26 DEBIT AUD 141.37 Accounting & Legal Variable Expenses
e_wp_e_839 2018-07-18 -67.47 TT Services New Z 16 DEBIT Accounting & Legal Variable Expenses
e_wp_e_2512 2021-07-27 -470.20 Dept Internal Aff 23 DEBIT Accounting & Legal Variable Expenses
c_az_262 2021-10-01 -210.45 Mtm Accounting Ltd. Mlm Accounting Accounting & Legal Variable Expenses
c_az_421 2021-07-19 -561.20 Paypal *Mtm Paypal *Mtm Accounting & Legal Variable Expenses
c_az_1082 2020-11-25 -52.20 Ministryofbusiness 285515Dpsa2A Mbie Accounting & Legal Fixed Expenses
c_az_1110 2020-11-16 -2228.70 Mtm Accounting Ltd. Mlm Accounting Accounting & Legal Variable Expenses
This works as desired. I have however many more categories besides itm="Accounting & Legal" (e.g. itm="Bank_charges) and sometimes there are Fixed Expenses and Variable Expenses. At other times one of them only in which case the arrays created in the splits would only have one value. In such a case the second statement in the main body would become unused. For every new category there are different search patterns. I am new to writing awk programs and am stuck on how to approach this. How could this be written efficiently to accommodate the above scenario.
The input (excerpt) would be:
c_wp_e_2025 2021-05-10 -120.00 8430Ken Jebsen BILL PAYMENT 12:08-11662
c_wp_e_2148 2021-07-27 -90.85 6450Barrie L. Jorgen BILL PAYMENT 09:31-35651
e_wp_e_789 2018-06-12 -190.00 Immigration NZ 11 DEBIT
e_wp_e_818 2018-06-29 -153.43 Dept of Home Affa 26 DEBIT AUD 141.37
e_wp_e_839 2018-07-18 -67.47 TT Services New Z 16 DEBIT
e_wp_e_2512 2021-07-27 -470.20 Dept Internal Aff 23 DEBIT
c_az_262 2021-10-01 -210.45 Mtm Accounting Ltd. Mlm Accounting
c_az_421 2021-07-19 -561.20 Paypal *Mtm Paypal *Mtm
c_az_1082 2020-11-25 -52.20 Ministryofbusiness 285515Dpsa2A Mbie
c_az_1110 2020-11-16 -2228.70 Mtm Accounting Ltd. Mlm Accounting
Criteria for Bank_charges e.g. would be:
split("^Cle|Forei|2 WBC|irnie W| \
wtown W|*M|^Repl|elex|^Unar|lert$,enance$|ebit|A/C|rice$", arr1, ",")
split("Variable Expenses,Fixed Expenses", arr2,",")
itm="Bank_charges
Some output from this criteria would result in:
c_az_1668 2020-03-06 -10.00 Visa Debit Card Fee 4825561****** 4823 Bank_charges Fixed Expenses
c_az_1687 2020-02-28 -8.50 Monthly A/C Fee Bank_charges Fixed Expenses
c_az_1688 2020-02-28 -2.50 Clearance Fee Bank_charges Variable Expenses
c_az_1785 2020-01-31 -8.50 Monthly A/C Fee Bank_charges Fixed Expenses
Another category that only has Variable expenses is:
split("^318|^74 |ASB|^City S|^Fix", arr1,",")
split("Variable Expenses", arr2,",")
itm="Bank_withdrawals"
This however does not work at all with the above action. All data from the input file is categorized Bank_withdrawals and classified as Variable expenses.