3

I have a list of files within a folder and I want to extract the filenames with the following pattern and insert them into array.

The pattern is that the file name always begin with either "MCABC_" or "MCBBC_" and then a date and then ends with ".csv"

An example would be "MCABC_20110101.csv" , ""MCBBC_20110304.csv"

Right now, I can only come up with the following solution which works but it is not ideal .

ls | grep -E "MCABC_[ A-Za-z0-9]*|MC221_[ A-Za-z0-9]*"

I read that it is bad to use ls. And I should use glob.

I am completely new to bash scripting. How could I extract the filenames with the patterns above and insert it into an array ? Thanks.

Update: Thanks for the answers. Really appreciate your answers. I have the following code

#!/bin/bash
shopt -s nullglob
files=(MC[1-2]21_All_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv)
echo ${#files[*]}
echo ${files[0]}

And this is the result that I got when I ran bash testing.sh.

: invalid shell option namesh: line 2: shopt: nullglob 1 (MC[1-2]21_All_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv)

However, if I just ran on the command line files=(MC[1-2]21_All_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv) and then echo ${files[*]}, I manage to get the output:

MC121_All_20180301.csv MC121_All_20180302.csv MC121_All_20180305.csv MC221_All_20180301.csv MC221_All_20180302.csv MC221_All_20180305.csv

I am very confused. Why is this happening ? (Pls note that I running this on ubuntu within window 10.)

mynameisJEFF
  • 4,073
  • 9
  • 50
  • 96
  • Consider, instead, using [find with regex](https://stackoverflow.com/questions/6844785/how-to-use-regex-with-find-command?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa) to do your file search. – JNevill Apr 11 '18 at 16:15
  • Yes. I did try playing with find i.e. `find . -regextype sed -regex 'MC121'` but it returns nothing. I am really new to bash and regex. I am not too sure what was the mistake there. – mynameisJEFF Apr 11 '18 at 16:21

2 Answers2

2

I think you can just populate the array directly using a glob:

files=( MC[AB]BC_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv )

The "date" part can certainly be improved, since it matches completely invalid dates like 98765432, but maybe that's not a problem.

Tom Fenech
  • 72,334
  • 12
  • 107
  • 141
  • I think OP needs the real filenames, not make them up. – Matias Barrios Apr 11 '18 at 16:31
  • Sorry, I don't understand what you mean. If I have `MCABC_20110101.csv` and `MCBBC_20110304.csv` in my directory, this creates an array containing both of those filenames. – Tom Fenech Apr 11 '18 at 16:46
  • i just tested your line of code on command line and it worked. But when i put the same line of code in shell script, it failed and `echo ${#files[*]}` returns 1, while `echo ${files[0]}` returns the `( MC[AB]BC_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv )`. What did I do wrong ? – mynameisJEFF Apr 12 '18 at 15:23
  • That's because the glob failed to match any files, so you just end up with one element in the array: the glob itself. You can prevent this from happening using a shell option `shopt -s nullglob` (that would result in an empty array) or `shopt -s failglob` (that would result in an error message). – Tom Fenech Apr 12 '18 at 15:25
  • but i tested the same line of code on command line and it manage to output the file names that i wanted. It's just that when i put it into the bash script then it failed. I also added the option as you mentioned and it returned `: invalid shell option nameg: line 4: shopt: nullglob ` – mynameisJEFF Apr 12 '18 at 15:37
  • Sounds like you're running your script in a shell other than bash. Add `#!/bin/bash` to the top and/or run it with `bash script.sh` instead of `sh script.sh`. – Tom Fenech Apr 12 '18 at 15:41
-1

This will work in BASH.

#!/bin/bash
for file_name in M*
do

    line="$line $( printf "${file_name%_*}")"
done
array=( $line )
echo "${array[2]}"

Another way :

#!/bin/bash

declare -a files_array
i=0
for file_name in M*
do
    files_array[$i]="$( printf "${file_name%_*}")"
    (( i++ ))
done

echo "${files_array[2]}"

Regards!

Matias Barrios
  • 4,674
  • 3
  • 22
  • 49
  • 1
    I don't understand why you're using `printf` at all, but you should always use a format specifier e.g. `%s` before your variables. I'm not sure why you're removing everything after the last `_` in the filenames, and I would recommend against practices that rely on word splitting like `array=( $line )`. Note that if you want to build an array in a loop, you can use `files_array+=( "$file_name" )`. – Tom Fenech Apr 11 '18 at 16:44