TL;DR How to filter an ls/find output using grep with an array as a pattern?
Background story: I have a pipeline which I have to rerun for datasets which run into an error. Which datasets are run into an error is saved in a tab separated file. I want to delete the files where the pipeline has run into an error.
To do so I extracted the dataset names from another file containing the finished dataset and saved them in a bash array {ds1 ds2 ...} but now I am stuck because I cannot figure out how to exclude the datasets in the array from my deletion step.
This is the folder structure (X=1-30): datasets/dsX/results/dsX.tsv
Not excluding the finished datasets, meaning deleting the folders of the failed and the finished datasets works like a charm
#1. move content to a trash folder
ls /datasets/*/results/*|xargs -I '{}' mv '{}' ./trash/
#2. delete the empty folders
find /datasets/*/. -type d -empty -delete
But since I want to exclude the finished datasets I thought it would be clever to save them in an array:
#find finished datasets by extracting the dataset names from a tab separated log file
mapfile -t -s 1 finished < <(awk '{print $2}' $path/$log_pf)
echo ${finished[@]}
which works as expected but now I am stuck in filtering the ls output using that array: *pseudocode
#trying to ignore the dataset in the array - not working
ls -I${finished[@]} -d /datasets/*/
#trying to reverse grep for the finished datasets - not working
ls /datasets/*/ | grep -v {finished}
What do you think about my current ideas? Is this possible using bash only? I guess in python I could do that easily but for training purposes, I want to do it in bash.