Iterate through several files in bash

Question

I have a folder with several files that are named like this:

file.001.txt.gz, file.002.txt.gz, ... , file.150.txt.gz

What I want to do is use a loop to run a program with each file. I was thinking in something like this (just a sketch):

for i in {1:150}
  gunzip file.$i.txt.gz
  ./my_program file.$i.txt output.$1.txt
  gzip file.$1.txt

First of all, I don't know if something like this is gonna work, and second, I can't figure out how to keep the three digits numeration the file have ('001' instead of just '1').

Thanks a lot

You need to change 'output.$1.txt' to 'output.$i.txt' as a first step — Horia Coman, Oct 20 '16 at 11:21

score 1 · Accepted Answer · answered Oct 20 '16 at 11:26

The syntax for ranges in bash is

{1..150}

not {1:150}.

Moreover, if your bash is recent enough, you can add the leading zeroes:

{001..150}

The correct syntax of the for loop needs do and done.

for i in {001..150} ; do
    # ...
done

It's unclear what $1 contains in your script.

score 1 · Answer 2 · edited Oct 20 '16 at 12:03

1

To iterate over files I believe the simpler way is: (assuming there are no files named 'file.*.txt' already in the directory and that your output file can have a different name)

for i in file.*.txt.gz; do
    gunzip $i
    ./my_program $i $i-output.txt
    gzip file.*.txt
done

edited Oct 20 '16 at 12:03

chepner

497,756
71
530
681

answered Oct 20 '16 at 11:28

Ariel Nigri

21
2

Ruslan Osmanov · Answer 3 · 2016-10-20T12:05:27.070

Using find command:

# Path to the source directory
dir="./"

while read file
do
  output="$(basename "$file")"
  output="$(dirname "$file")/"${output/#file/output}
  echo "$file ==> $output"
done < <(find "$dir" \
  -regextype 'posix-egrep' \
  -regex '.*file\.[0-9]{3}\.txt\.gz$')

The same via pipe:

find "$dir" \
  -regextype 'posix-egrep' \
  -regex '.*file\.[0-9]{3}\.txt\.gz$' | \
  while read file
  do
    output="$(basename "$file")"
    output="$(dirname "$file")/"${output/#file/output}
    echo "$file ==> $output"
  done

Sample output

/home/ruslan/tmp/file.001.txt.gz ==> /home/ruslan/tmp/output.001.txt.gz
/home/ruslan/tmp/file.002.txt.gz ==> /home/ruslan/tmp/output.002.txt.gz

(for $dir=/home/ruslan/tmp/).

Description

The scripts iterate the files in $dir directory. The $file variable is filled with the next line read from the find command. The find command returns a list of paths corresponding to the regular expression '.*file\.[0-9]{3}\.txt\.gz$'.

The $output variable is built from two parts: basename (path without directories) and dirname (path to file's directory).

${output/#file/output} expression replaces file with output at the front end of $output variable (see Manipulating Strings)

score 0 · Answer 4 · answered Oct 20 '16 at 11:50

0

Try-

for i in $(seq -w 1 150)     #-w adds the leading zeroes
do
  gunzip file."$i".txt.gz
  ./my_program file."$i".txt output."$1".txt
  gzip file."$1".txt
done

answered Oct 20 '16 at 11:50

Chem-man17

1,700
1
12
27

1

Although this code may help to solve the problem, it doesn't explain _why_ and/or _how_ it answers the question. Providing this additional context would significantly improve its long-term educational value. Please [edit] your answer to add explanation, including what limitations and assumptions apply. – Toby Speight Oct 20 '16 at 16:27

score 0 · Answer 5 · edited May 23 '17 at 12:33

The syntax for ranges is as choroba said, but when iterating over files you usually want to use a glob. If you know all the files have three digits in their names you can match on digits:

shopt -s nullglob
for i in file.0[0-9][0-9].txt.gz file.1[0-4][0-9] file.15[0].txt.gz; do
  gunzip file.$i.txt.gz
  ./my_program file.$i.txt output.$i.txt
  gzip file.$i.txt
done

This will only iterate through files that exist. If you use the range expression, you have to take extra care not to try to operate on files that don't exist.

for i in file.{000..150}.txt.gz; do
    [[ -e "$i" ]] || continue
    ...otherstuff
done

Iterate through several files in bash

5 Answers5