49

I need to process a large number of files in a directory. The files can be partitioned into several groups, based upon the file names. That is to say, the file names can be pattern matchedne which 'group' they belong to. For instance, the names are like this:

  • YYYYMMDD_*_bulk_import.csv
  • YYYYMMDD_*_genstats_import.csv
  • YYYYMMDD_*allstats.csv

etc ...

Each 'group' has a different processing methodology (i.e. a different command is called for processing).

I want to write a bash script to:

  1. Iterate through all CSV files in the directory
  2. Determine which 'group' a file belongs to by pattern matching its name to known patterns (like the examples I gave above)
  3. Call a command based on the determined grouping.

I am running on Ubuntu 10.0.4. I am new to bash, and would appreciate skeleton code snippet that will help me get started in writing this script.

Homunculus Reticulli
  • 65,167
  • 81
  • 216
  • 341

2 Answers2

87

The easiest way is probably just to iterate each group separately. This side-steps the parsing issue entirely.

DIRECTORY=.

for i in $DIRECTORY/YYYYMMDD_*_bulk_import.csv; do
    # Process $i
done

for i in $DIRECTORY/YYYYMMDD_*_genstats_import.csv; do
    # Process $i
done

for i in $DIRECTORY/YYYYMMDD_*allstats.csv; do
    # Process $i
done

Set DIRECTORY to whatever directory you want to search. The default . will search the current working directory.

cdhowie
  • 158,093
  • 24
  • 286
  • 300
  • 7
    Would there be a simple way to get the "value" of `*` in the loop? – luckydonald Mar 07 '18 at 12:08
  • I'm new to BASH, but I think you can get the "value" using ```tmp=${i#*_}; value=${tmp%_bulk_import.csv}```, and similarly for the other groups. I am basing this on https://stackoverflow.com/a/428580/6394617, and I've tried it on my files. – Joe Apr 10 '21 at 14:45
  • 1
    What happens when $DIRECTORY expands to something with spaces in it? – phreed Oct 01 '21 at 16:27
  • @phreed Are you asking because you don't know, or for some other reason? – cdhowie Oct 03 '21 at 07:15
  • @cdhowie I am asking because I think it will case a problem if $DIRECTORY contains spaces. I think it should be quoted, like "$DIRECTORY"/YYYMMDD... I asked because I am not sure. – phreed Oct 07 '21 at 22:24
11

Here is basic iteration over files, with switch block to determine file type.

#!/bin/bash
for f in *; do
        case $f in 
                [0-9]*_bulk_import.csv)
                        echo $f case 1
                        ;;
                [0-9]*_genstats_import.csv)
                        echo $f case 2
                        ;;
                [0-9]*allstats.csv)
                        echo $f case 3
                        ;;
        esac
done
jazgot
  • 1,943
  • 14
  • 25