1

I have a directory which has more than 330000 files and unfortunately I cannot use ls. In order to list them I use find and I have printed the output in a file list of files

These files are named sequentially, therefore there is a long list that goes Blast0_1.txt.gz Blast0_2.txt.gz Blast0_3.txt.gz....

and these numbers go up to 587, hence the total of the files should 588x588=345744 (because numbering for both before and after the underscore starts at 0

There are some combinations that are missing, because the total should be 345744 but unfortunately it is 331357. Is there an easy way to find the missing combinations through bash? I saw that there are available some solutions online but they do not work for me and I cannot figure how to adapt any of them in my dataset.

any help is greatly appreciated

Panos
  • 179
  • 2
  • 13

1 Answers1

2

You could iterate through all possible filenames and check whether the file exists. On my laptop, this took around 8 seconds for 588x588 combinations.

for i in {0..588}; do
    for j in {0..588}; do
        file_name="Blast${i}_${j}.txt.gz"
        [ ! -f $file_name ] && echo "$file_name"
    done
done

This will go through all possible combinations, check whether the file exists and if not, print its filename to the console.

Depending on your naming scheme, you might have to zero pad the numbers.

ptts
  • 1,848
  • 6
  • 14