0

I have a folder with several files that are named like this:

file.001.txt.gz, file.002.txt.gz, ... , file.150.txt.gz

What I want to do is use a loop to run a program with each file. I was thinking in something like this (just a sketch):

for i in {1:150}
  gunzip file.$i.txt.gz
  ./my_program file.$i.txt output.$1.txt
  gzip file.$1.txt

First of all, I don't know if something like this is gonna work, and second, I can't figure out how to keep the three digits numeration the file have ('001' instead of just '1').

Thanks a lot

5 Answers5

1

The syntax for ranges in bash is

{1..150}

not {1:150}.

Moreover, if your bash is recent enough, you can add the leading zeroes:

{001..150}

The correct syntax of the for loop needs do and done.

for i in {001..150} ; do
    # ...
done

It's unclear what $1 contains in your script.

choroba
  • 231,213
  • 25
  • 204
  • 289
1

To iterate over files I believe the simpler way is: (assuming there are no files named 'file.*.txt' already in the directory and that your output file can have a different name)

for i in file.*.txt.gz; do
    gunzip $i
    ./my_program $i $i-output.txt
    gzip file.*.txt
done
chepner
  • 497,756
  • 71
  • 530
  • 681
1

Using find command:

# Path to the source directory
dir="./"

while read file
do
  output="$(basename "$file")"
  output="$(dirname "$file")/"${output/#file/output}
  echo "$file ==> $output"
done < <(find "$dir" \
  -regextype 'posix-egrep' \
  -regex '.*file\.[0-9]{3}\.txt\.gz$')

The same via pipe:

find "$dir" \
  -regextype 'posix-egrep' \
  -regex '.*file\.[0-9]{3}\.txt\.gz$' | \
  while read file
  do
    output="$(basename "$file")"
    output="$(dirname "$file")/"${output/#file/output}
    echo "$file ==> $output"
  done

Sample output

/home/ruslan/tmp/file.001.txt.gz ==> /home/ruslan/tmp/output.001.txt.gz
/home/ruslan/tmp/file.002.txt.gz ==> /home/ruslan/tmp/output.002.txt.gz

(for $dir=/home/ruslan/tmp/).

Description

The scripts iterate the files in $dir directory. The $file variable is filled with the next line read from the find command. The find command returns a list of paths corresponding to the regular expression '.*file\.[0-9]{3}\.txt\.gz$'.

The $output variable is built from two parts: basename (path without directories) and dirname (path to file's directory).

${output/#file/output} expression replaces file with output at the front end of $output variable (see Manipulating Strings)

Ruslan Osmanov
  • 20,486
  • 7
  • 46
  • 60
0

Try-

for i in $(seq -w 1 150)     #-w adds the leading zeroes
do
  gunzip file."$i".txt.gz
  ./my_program file."$i".txt output."$1".txt
  gzip file."$1".txt
done
Chem-man17
  • 1,700
  • 1
  • 12
  • 27
  • 1
    Although this code may help to solve the problem, it doesn't explain _why_ and/or _how_ it answers the question. Providing this additional context would significantly improve its long-term educational value. Please [edit] your answer to add explanation, including what limitations and assumptions apply. – Toby Speight Oct 20 '16 at 16:27
0

The syntax for ranges is as choroba said, but when iterating over files you usually want to use a glob. If you know all the files have three digits in their names you can match on digits:

shopt -s nullglob
for i in file.0[0-9][0-9].txt.gz file.1[0-4][0-9] file.15[0].txt.gz; do
  gunzip file.$i.txt.gz
  ./my_program file.$i.txt output.$i.txt
  gzip file.$i.txt
done

This will only iterate through files that exist. If you use the range expression, you have to take extra care not to try to operate on files that don't exist.

for i in file.{000..150}.txt.gz; do
    [[ -e "$i" ]] || continue
    ...otherstuff
done
Community
  • 1
  • 1
kojiro
  • 74,557
  • 19
  • 143
  • 201