Unix noob here again. I'm writing a script for a Unix class I'm taking online, and it is supposed to handle a csv file while excluding some info and tallying other info, then print a report when finished. I've managed to write something (with help) but am having to debug it on my own, and it's not going well.
After sorting out my syntax errors, the script runs, however all it does is print the header before the loop at the end, and one single value of zero under "Gross Sales", where it doesn't belong.
My script thus far: I am open to any and all suggestions. However, I should mention again, I don't know what I'm doing, so you may have to explain a bit.
#!/bin/bash
grep -v '^C' online_retail.csv \
| awk -F\\t '!($2=="POST" || $8=="Australia")' \
| awk -F\\t '!($2=="D" || $3=="Discount")' \
| awk -F\\t '{(country[$8] += $4*$6) && (($6 < 2) ? quantity[$8] += $4 : quantity[$8] += 0)} \
END{print "County\t\tGross Sales\t\tItems Under $2.00\n \
-----------------------------------------------------------"; \
for (i in country) print i,"\t\t",country[i],"\t\t", quantity[i]}'
THE ASSIGNMENT Summary:
Using only awk and sed (or grep for the clean-up portion) write a script that prepares the following reports on the above retail dataset.
First, do some data "clean up" -- you can use awk, sed and/or grep for these:
Invoice numbers with a 'C' at the beginning are cancellations. These are just noise -- all lines like this should be deleted. Any items with a StockCode of "POST" should be deleted. Your Australian site has been producing some bad data. Delete lines where the "Country" is set to "Australia". Delete any rows labeled 'Discount' in the Description (or have a 'D' in the StockCode field). Note: if you already completed steps 1-3 above, you've probably already deleted these lines, but double-check here just in case. Then, print a summary report for each region in a formatted table. Use awk for this part. The table should include:
Gross sales for each region (printed in any order). The regions in the file, less Australia, are below. To calculate gross sales, multiply the UnitPrice times the Quantity for each row, and keep a running total. France United Kingdom Netherlands Germany Norway Items under $2.00 are expected to be a big thing this holiday season, so include a total count of those items per region. Use field widths so the columns are aligned in the output. The output table should look like this, although the data will produce different results. You can format the table any way you choose but it should be in a readable table with aligned columns. Like so:
Country Gross Sales Items Under $2.00
---------------------------------------------------------
France 801.86 12
Netherlands 177.60 1
United Kingdom 23144.4 488
Germany 243.48 11
Norway 1919.14 56
A small sample of the csv file:
InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
536388,22469,HEART OF WICKER SMALL,12,12/1/2010 9:59,1.65,16250,United Kingdom
536388,22242,5 HOOK HANGER MAGIC TOADSTOOL,12,12/1/2010 9:59,1.65,16250,United Kingdom
C536379,D,Discount,-1,12/1/2010 9:41,27.5,14527,United Kingdom
536389,22941,CHRISTMAS LIGHTS 10 REINDEER,6,12/1/2010 10:03,8.5,12431,Australia
536527,22809,SET OF 6 T-LIGHTS SANTA,6,12/1/2010 13:04,2.95,12662,Germany
536527,84347,ROTATING SILVER ANGELS T-LIGHT HLDR,6,12/1/2010 13:04,2.55,12662,Germany
536532,22962,JAM JAR WITH PINK LID,12,12/1/2010 13:24,0.85,12433,Norway
536532,22961,JAM MAKING SET PRINTED,24,12/1/2010 13:24,1.45,12433,Norway
536532,84375,SET OF 20 KIDS COOKIE CUTTERS,24,12/1/2010 13:24,2.1,12433,Norway
536403,POST,POSTAGE,1,12/1/2010 11:27,15,12791,Netherlands
536378,84997C,BLUE 3 PIECE POLKADOT CUTLERY SET,6,12/1/2010 9:37,3.75,14688,United Kingdom
536378,21094,SET/6 RED SPOTTY PAPER PLATES,12,12/1/2010 9:37,0.85,14688,United Kingdom
Seriously, Thank you to whoever can help. You all are amazing!
edit: I think I used the wrong field separator... but not sure how it is supposed to look. still tinkering...
edit2: okay, I "fixed?" the delimiter and changed it from awk -F\\t
to awk -F\\','
and now it runs, however, the report data is all incorrect. sigh... I will trudge on.