How to parse specific values from csv file to a for loop command?

Question

I am trying to write a for loop where I conditionally parse specific values from a csv file into the do command.

My situation is as follows: I have several directories containing genome sequences. The samples are numbered and the directories are named accordingly.

Dir 1 contains sample1_genome.fasta
Dir 2 contains sample2_genome.fasta
Dir 3 contains sample3_genome.fasta

The genome sequences have differing average read lengths. It is important to adress this. Therefore, I created a csv file containing the sample number and the according average read length of the genome sequence. csv file example (first column = sample_no, 2nd column = avg_read_length):

1,130
2,134
3,129

Now, I want to loop through the directories, take the genome sequences as input and parse the respective average read length to the process.

my code is as follows:

for f in *
do 
     shortbred_quantify.py --genome $f/sample${f%}.fasta --aerage_read_length *THE SAMPLE MATCHING VALUE FROM 2nd COLUMN* --results results/quantify_results_sample${f%}
done

Can you help me out with this?

Your example is not a csv file, and if it doesn't have headers then don't include it. So... are you passing the csv file as input to a script (i.e. what is *?). — Allan Wind, Dec 07 '21 at 14:14
I edited the table to csv format. I run the loop directly in the terminal. The asterisk stands for directories containing genome sequences of samples. The directories are named according to the samples, e. g. 1, 2, 3 — plicht, Dec 07 '21 at 14:18

Allan Wind · Answer 1 · 2021-12-07T21:39:45.317

0

I would structure it along these lines:

while IFS=, read sample read_length
do
    shortbred_quantify.py --genome "$sample/genome_sample.fasta" --avgreadBP "$read_length" --results "results/quantify_results_sample$sample"
done < ./input

edited Dec 07 '21 at 21:39

answered Dec 07 '21 at 14:19

Allan Wind

23,068
5
28
38

Thanks a lot for you help! How can I then loop through the directories to take the particular sample files as input? I need to bring the read_length with the respective sample file together – plicht Dec 07 '21 at 14:38
You can replace your.csv with a glob (*/*.csv) or whatever. Your question is not really clear, so I suggest you update it to be more precise. Like do you need to select by read_length then use sample to identify the file? – Allan Wind Dec 07 '21 at 14:42
I updated my initial post – plicht Dec 07 '21 at 14:58
@plicht good job on refining the question. Did you have a chance to give my updated answer a whirl? – Allan Wind Dec 07 '21 at 21:40

score 0 · Answer 2 · answered Dec 07 '21 at 14:47

Use awk. $2 is the second field, $1 is the first. eg:

$ cat input
1,130
2,134
3,129
$ awk '$2 == avgReadBP{ print $1 }' FS=, avgReadBP=134 input
2

So your command ends up looking like:

input="$f"/genome_sample.fasta
shortbred_quantify.py --genome "$input" \
    --avgreadBP "$(awk '$2 == a{ print $1 }' FS=, a="$value_to_match" "$input")" \
    --results results/quantify_results_sample"${f}"

Don't forget to quote the filename.

How to parse specific values from csv file to a for loop command?

2 Answers2