I get an input file vendor.csv which has a column called retailer. I have a predefined list of valid retailer values which are a,b,c. If 'd' comes in the retailer column I will have to take some action , mostly echo it to a log and stop the processing and notify the user.
I have done the following so far
f1=/stage/Scripts/ecommerce/vendor/final*.csv
k=`cut -d, -f1 $f1 |sort -u`
echo $k
This gives me
a b c d
The above o/p is not comma seperated
I can store the valid values a b c in a file or a string , for the above case
How do I make a check now ? Is this the best way to go about this
the valid values are ALB/SFY Total Ecom TA Peapod Total Ecom TA Target Total Ecom TA
The existing data contains the following unique data points
ALB/SFY Total Ecom TA Hy-Vee Total Ecom TA Peapod Total Ecom TA Target Total Ecom TA
So the "Hy-Vee Total Ecom TA" is an invalid value.
Here is my attempt with grep
$ echo $s
ALB/SFY Total Ecom TA Peapod Total Ecom TA Target Total Ecom TA
echo $k
ALB/SFY Total Ecom TA Hy-Vee Total Ecom TA Peapod Total Ecom TA Target Total Ecom TA
grep -v "$s" "$k"
It gave me an error
grep: ALB/SFY Total Ecom TA
Hy-Vee Total Ecom TA
Peapod Total Ecom TA
Target Total Ecom TA: No such file or directory
Some of the solutions have pointed me in the right way, In R I would go about the above task as
valid_values = ['a','b','c']
invalid_retailer = unique(vendorfile$retailer) %not% in valid_values
I was trying to replicate the same process in shell, and hence my usage of cut and grep.