Bash script to iterate files in directory and pattern match filenames

Question

I need to process a large number of files in a directory. The files can be partitioned into several groups, based upon the file names. That is to say, the file names can be pattern matchedne which 'group' they belong to. For instance, the names are like this:

YYYYMMDD_*_bulk_import.csv
YYYYMMDD_*_genstats_import.csv
YYYYMMDD_*allstats.csv

etc ...

Each 'group' has a different processing methodology (i.e. a different command is called for processing).

I want to write a bash script to:

Iterate through all CSV files in the directory
Determine which 'group' a file belongs to by pattern matching its name to known patterns (like the examples I gave above)
Call a command based on the determined grouping.

I am running on Ubuntu 10.0.4. I am new to bash, and would appreciate skeleton code snippet that will help me get started in writing this script.

score 87 · Accepted Answer · answered Jun 25 '12 at 08:29

87

The easiest way is probably just to iterate each group separately. This side-steps the parsing issue entirely.

DIRECTORY=.

for i in $DIRECTORY/YYYYMMDD_*_bulk_import.csv; do
    # Process $i
done

for i in $DIRECTORY/YYYYMMDD_*_genstats_import.csv; do
    # Process $i
done

for i in $DIRECTORY/YYYYMMDD_*allstats.csv; do
    # Process $i
done

Set DIRECTORY to whatever directory you want to search. The default . will search the current working directory.

answered Jun 25 '12 at 08:29

cdhowie

158,093
24
286
300

7

Would there be a simple way to get the "value" of `*` in the loop? – luckydonald Mar 07 '18 at 12:08
I'm new to BASH, but I think you can get the "value" using ```tmp=${i#*_}; value=${tmp%_bulk_import.csv}```, and similarly for the other groups. I am basing this on https://stackoverflow.com/a/428580/6394617, and I've tried it on my files. – Joe Apr 10 '21 at 14:45
1

What happens when $DIRECTORY expands to something with spaces in it? – phreed Oct 01 '21 at 16:27
@phreed Are you asking because you don't know, or for some other reason? – cdhowie Oct 03 '21 at 07:15
@cdhowie I am asking because I think it will case a problem if $DIRECTORY contains spaces. I think it should be quoted, like "$DIRECTORY"/YYYMMDD... I asked because I am not sure. – phreed Oct 07 '21 at 22:24

score 11 · Answer 2 · answered Jun 25 '12 at 08:46

Here is basic iteration over files, with switch block to determine file type.

#!/bin/bash
for f in *; do
        case $f in 
                [0-9]*_bulk_import.csv)
                        echo $f case 1
                        ;;
                [0-9]*_genstats_import.csv)
                        echo $f case 2
                        ;;
                [0-9]*allstats.csv)
                        echo $f case 3
                        ;;
        esac
done

Bash script to iterate files in directory and pattern match filenames

2 Answers2