I have 2987 files that I need to process in batches of 1000 files (our SLURM scheduler does not like more than that). Now I have the following bash code:
# collecting all the dataset files into an array called FILES
FILES=($(ls *.fast5))
echo ${#FILES[@]}
# select only the first 1000 items in the array
SUBSET=(${FILES[@]:0:1000}) #selecting elements 0 to 1000 --> 1000 elements
SUBSET=(${FILES[@]:1000:2000}) #selecting elements 1000 to 2000 --> 1987 elements
SUBSET=(${FILES[@]:2000:2987}) #selecting elements 2000 to 2987 --> 987 elements
#determine length of array Subset
echo ${#SUBSET[@]}
## determine which dataset to analyze
MYFILE=${SUBSET[$SLURM_ARRAY_TASK_ID]} ## identify which dataset is analyzed
## starting analysis
echo current dataset is: $MYFILE
Now my problem is that the selection of elements 1000 to 2000, gives me an array with the length 1987. I have no clue why that is, or what is wrong in my code, why I get an array that is way longer than 1000 elements.
any suggestions, pointers etc are welcome.