Explanation wanted? Selecting 1000 elements from large array?

Question

I have 2987 files that I need to process in batches of 1000 files (our SLURM scheduler does not like more than that). Now I have the following bash code:

# collecting all the dataset files into an array called FILES
FILES=($(ls *.fast5))
echo ${#FILES[@]}

# select only the first 1000 items in the array
SUBSET=(${FILES[@]:0:1000})  #selecting elements 0 to 1000 --> 1000 elements
SUBSET=(${FILES[@]:1000:2000}) #selecting elements 1000 to 2000  --> 1987 elements
SUBSET=(${FILES[@]:2000:2987}) #selecting elements 2000 to 2987 --> 987 elements

#determine length of array Subset
echo ${#SUBSET[@]}

## determine which dataset to analyze
MYFILE=${SUBSET[$SLURM_ARRAY_TASK_ID]}  ## identify which dataset is analyzed

## starting analysis
echo current dataset is: $MYFILE

Now my problem is that the selection of elements 1000 to 2000, gives me an array with the length 1987. I have no clue why that is, or what is wrong in my code, why I get an array that is way longer than 1000 elements.

any suggestions, pointers etc are welcome.

`FILES=($(ls *.fast5))` Do __not__ parse `ls` output. Just `FILES=(*.fast5)`. Remember to quote variable expansions. Don't `SUBSET=(${FILES[@]:0:1000})` do `SUBSET=("${FILES[@]:0:1000}")`. — KamilCuk, Feb 26 '20 at 13:33
Okay thanks for the corrections of my code, and I did not understand the need for quoting. Thanks for pointing it out. This thread seems to be a good source for why you need to quote: https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable — Thomas Haverkamp, Feb 26 '20 at 13:53
You're overwriting `SUBSET` twice, it'll only contain what you assign in the third assignment, is that on purpose or just to show in the question? — Benjamin W., Feb 26 '20 at 14:25

scragar · Accepted Answer · 2020-02-26T14:18:18.797

4

The last param of the subselect isn't the number to stop at, it's the number of results to limit itself to. You're asking for 2,000 results starting at 1,000, not results between 1,000 and 2,000.

SUBSET=(${FILES[@]:0:1000})  #selecting elements 0 to 999
SUBSET=(${FILES[@]:1000:1000}) #selecting elements 1000 to 1999
SUBSET=(${FILES[@]:2000:1000}) #selecting elements 2000 to 2999

edited Feb 26 '20 at 14:18

answered Feb 26 '20 at 13:31

scragar

6,764
28
36

Brilliant, that explains it. Thanks @scragar – Thomas Haverkamp Feb 26 '20 at 13:39
1

I thought `${FILES[@]:0:1000}` would fetch elements from 0 to 999. – vdavid Feb 26 '20 at 14:09
@vdavid you're right, I copy/pasted the comments and updated them without thinking about that. I've updated the post. – scragar Feb 26 '20 at 14:18

Explanation wanted? Selecting 1000 elements from large array?

1 Answers1