1

I am new to shell, I have a case where I am trying to evaluate a particular column unique values to check if they are valid in a shell script which will be invoked later.

From my searches I think cut along with sort & unique is good to do it

So my attempt is

file=/filepath/*vendor.csv
file_categories = `cut -d, -f1 $file |sort |unique`

$file should hold file which has vendor in its filename

but even after using command substitution (`) the $file is not getting replaced with the correct filename , it just places what is present in file

Another example for what I am attempting is

a=/stage/scripts/vendor/*.out
echo $a
/stage/Scripts/ecommerce/oneclick/nohup.out /stage/Scripts/ecommerce/Vendor/Vendor_Automate_Ingestion_Process.out

wc-l

wc: /stage/Scripts/ecommerce/vendor/*.out:

$(wc -l "$a")
wc: /stage/Scripts/ecommerce/vendor/*.out:No such file or directory

I want to understand how we can pass wild characters in command substitution and what I can do to rectify.

kvantour
  • 25,269
  • 4
  • 47
  • 72
av abhishiek
  • 647
  • 2
  • 11
  • 26
  • Possible duplicate of [How to assign a glob expression to a variable in a Bash script?](https://stackoverflow.com/questions/369145/how-to-assign-a-glob-expression-to-a-variable-in-a-bash-script) – kvantour Jan 19 '18 at 11:38
  • 1
    Please be sure to accept the answer that best solves your problem, if any, by pressing the [checkmark sign](https://i.stack.imgur.com/uqJeW.png) . This gives the respondent with the best answer 15 points of reputation. Yoe can also upvote any and all answers that help your understanding of your problem. Note that rep points are not subtracted (as some people seem to think) from the original-poster's reputation points ;-) . – shellter Jan 19 '18 at 16:25

2 Answers2

2

No, file will contain the literal string with the wildcard. When you interpolate the value $file without quotes around it, that's when the shell evaluates it as a wildcard. echo "$file" with proper quoting shows you the actual value of the variable.

There is no good way to store a list of file names in a regular shell variable. Ksh and some other shells have arrays for this purpose, but it's not portable back to generic sh and may be something else than what you actually need, depending on what end goal you are trying to accomplish. If you want to extract unique values from a field in the files matching the wildcard into a string, just make sure you don't have spaces around the equals sign in the assignment and you're done.

file_categories=$(cut -d, -f1 $file | sort -u)
#              ^ no spaces around the equals sign!

Storing the wildcard in a variable is dubious here; probably simply use the wildcard directly if this is the problem you want to solve.

Everywhere you don't specifically want the shell to expand wildcards and tokenize a string, you need to put double quotes around it.

echo "$file_categories"

This string isn't properly machine readable, and so it's of limited use to capture it in a variable at all. I'll wager a small sum of money that you actually simply want to display the output directly instead of storing it in a variable so that you can then echo its value:

cut -d, -f1 /filepath/*vendor.csv | sort -u

If you want to loop over the values, pipe this further to while read -r ...

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • So If I pass $ variable in command substitution , the wild characters are evaluated, my end goal is to evaluate if the particular column as valid set of predefined values, I wanted to extract the unique values check if all of them are valid or not, any suggestions on how that could be accomplished ? – av abhishiek Jan 19 '18 at 11:47
  • Post a new question with your *actual* requirements. There are too many things to guess here (what's valid, what does the data look like, etc) – tripleee Jan 19 '18 at 11:50
  • If you have a file containing the valid labels, `cut -f1 -d, *vendor.csv | grep -Fvxf valid.txt` will print the ones which are not in `valid.txt` ... but more usefully you probably want an Awk script to print the entire line which contains an invalid label. This sort of question gets asked dozens of times per week I think; look for duplicates. – tripleee Jan 19 '18 at 12:14
  • ,As per the policy I have to wait for 90 minutes to post the question again, I don't want the get the records with duplicate values, I just want to verify that the input data is valid, btw when I use ` cut -d, -f1 /filepath/*vendor.csv | sort -u` it considers the header as well for some reason – av abhishiek Jan 19 '18 at 12:20
  • If there is a header line then you want to exclude that, obviously. Trivially, `sed 1d` removes the first line. You still haven't defined how to determine that something is "valid" so I speculated; the code above looks for "valid" in a file (and doesn't try to do anything about duplicates) but again, look for duplicates before asking. This doesn't sound particularly unique or complicated if you can just define it somewhat more concretely. – tripleee Jan 19 '18 at 12:23
  • 1
    Thanks for your time, I have posted a separate question in (https://stackoverflow.com/questions/48341823/validating-unique-values-of-a-column-in-shell) ,if you get some time please take a look – av abhishiek Jan 19 '18 at 13:08
1

In order to make your wc -l command to work, call it as such:

wc -l $a

Do not quote the a variable, the shell needs to expand it to read its * wildcard value

AnythingIsFine
  • 1,777
  • 13
  • 11