1

Is it possible to get the last content of a wildcard?

Let's say I have a list of files:

foo-15.csv
foo-32.csv
foo-65.csv

etc.

Is it possible to access the wildcard content when using:

for file in foo-*.csv; do
    echo wildcard_content
done

So that it prints

 15
 32
 65

note: I don't want to use string manipulation with $file. The above is a mere example and the globular expression could be anything. Is it possible to access the wildcard content? Is the wildcard content stored in a bash-defined variable that I can call, or something like that?

kvantour
  • 25,269
  • 4
  • 47
  • 72
Argent
  • 33
  • 6
  • While you are asking for the information in asterisk, it is clear that you are interested in the filename without the extension (hence the duplicate). If you are really interested in the matching of wildcards, I suggest to rewrite the glob as a regex and use the `=~` test operator in combination with `BASH_REMATCH` – kvantour Sep 26 '19 at 10:21
  • @kvantour I am not interested in the duplicate you indicated. The filename part is just an example. And what is a "glob" ? – Argent Sep 26 '19 at 11:45
  • A globular expression is what you call a wildcard. It is different from a regular expression. So if I understand you correctly, if you have a file `foobar.5` which is matched with 'f*b?r.[0-9]' you would like to be able to know that `*` is matching `oo`, `?` is matching `a` and `[0-9]` is matching `2`? – kvantour Sep 26 '19 at 12:33
  • @kvantour Oh, ok, thanks. Yes, exactly. I was wondering if those values where stored somewhere, and accessible directly... I never saw that so I suppose it's impossible? – Argent Sep 26 '19 at 12:51
  • It is not impossible, it is just not straightforward. – kvantour Sep 26 '19 at 12:56
  • So your question is, is there an internal list from which `file` gets assigned on each iteration, which you can access? – tripleee Sep 26 '19 at 13:24
  • @tripleee Not using `file`. Is there an internal variable to which values are assigned each time I call `*`, and can I access it? – Argent Sep 26 '19 at 13:35
  • There is no "each time", the glob gets expanded before the loop starts executing, while the command is parsed. – tripleee Sep 26 '19 at 13:38

3 Answers3

3
#! /bin/bash

shopt -s nullglob

for file in *.csv; do
    echo "${file%.csv}"
done

You can remove the .csv part like above.

You can understand the working of % from the docs.

It states:

${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the ‘%’ case) or the longest matching pattern (the ‘%%’ case) deleted. If parameter is ‘@’ or ‘’, the pattern removal operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with ‘@’ or ‘’, the pattern removal operation is applied to each member of the array in turn, and the expansion is the resultant list.

@Sorin explained why to use shopt -s nullglob in comments. For further explanation, refer to the docs

tripleee
  • 175,061
  • 34
  • 275
  • 318
Mihir Luthra
  • 6,059
  • 3
  • 14
  • 39
  • Note: if the there is no *.csv file in the directory, you will end up listing all the files in that directory. Look into nullglob opt – Sorin Sep 26 '19 at 10:07
0

I believe what you are after is something which resembles BASH_REMATCH.

An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)). The return value is 0 if the string matches the pattern, and 1 otherwise. If the regular expression is syntactical incorrect, the conditional expression's return value is 2. If the shell option nocasematch is enabled, the match is performed without regard to the case of alphabetic characters. Any part of the pattern may be quoted to force it to be matched as a string. Substrings matched by parenthesized subexpressions within the regular expression are saved in the array variable BASH_REMATCH. The element of BASH_REMATCH with index 0 is the portion of the string matching the entire regular expression. The element of BASH_REMATCH with index n is the portion of the string matching the nth parenthesized subexpression.

Unfortunately, glob patterns and regular expressions are not the same (See this question, and this linuxjournal article). Nonetheless, we can make a one-to-one translation:

|-------+-------+---------------------------------------|
| glob  | regex | remark                                |
|-------+-------+---------------------------------------|
| *     | [^/]* | filenames don't have a /              |
| **/   | .*/   | ** repreensts full paths (end with /) |
| ?     | [^/]  | filenames don't have a /              |
| [...] | [...] | the character groups are the same     |
|-------+-------+---------------------------------------|

You have to be careful though, as some characters have special meanings in regular expressions, but not in globular expressions. (Eg. ., +):

So in the example of the OP, you can do this:

for file in foo-*.csv; do
   [[ "${file}" =~ foo-([^/]*)[.]csv ]]
   echo "${BASH_REMATCH[1]}"
done

Or a more complicated example:

for file in *-substring-[0-3a-9]-foo?.file
     [[ "${file}" =~ ([^/]*)-substring-([0-3a-9])-foo(.)[.]file ]]
     echo "${BASH_REMATCH[1]} ${BASH_REMATCH[2]} ${BASH_REMATCH[3]}"
done
kvantour
  • 25,269
  • 4
  • 47
  • 72
  • 1
    This sort of reverse engineers what the glob did, but if this is what the OP is trying to ask, I guess there is no more straightforward way (and the answer to "is there an internal data structure" is, emphatically, no). – tripleee Sep 26 '19 at 13:45
  • @tripleee I agree. There is probably a perl way to do it cleanly. – kvantour Sep 26 '19 at 14:21
-2

Similar to @Mihir answer but if you want to access the entire list of files per iteration then

files=(*.csv)
for file in "${files[@]}"; do
    echo ${files[@]//.csv/}
done
tomgalpin
  • 1,943
  • 1
  • 4
  • 12
  • There is no need to do an `ls`. Let bash take care of the globing, it will not screw it up. – kvantour Sep 26 '19 at 10:04
  • @kvantour - If you just do `files=*.csv` it will not glob and store the*.csv as the value to the variable. Apparent when you do the string replacement. Obviously if you just `echo $files` it will print the files as it is doing the glob then. – tomgalpin Sep 26 '19 at 10:08
  • @kvantour actually, in this case ls is beneficial if there is no *.csv in that directory. But as a general rule, ls should not be used for getting file names. – Sorin Sep 26 '19 at 10:09
  • 1
    It would be better to do `files=( *.csv )`, and then do `for file in "${files[@]}"` – kvantour Sep 26 '19 at 10:14
  • @kvantour - Answer updated following your suggestion, thank you – tomgalpin Sep 26 '19 at 10:18
  • As in the other answer: `files=( *.csv)` will result in an array with only one element '*.csv' and then fail if no *.csv files are present. Also `${files[@]//.csv/}` should be `${file//.csv/}` since you want to access the for variable not the entire array. – Sorin Sep 26 '19 at 10:24
  • @sorin - Please read the explaination of my answer - From the OPs question it isnt clear if he wants the entire list of files per iteration, or just the filename of the current iteration, this is how my answer differs from the others given – tomgalpin Sep 26 '19 at 10:27