.mcool files contain matrices for multiple resolutions.
Cooler for one .mcool file:
cooler ls ./../input/A001C007.hg38.nodups.pairs.mcool
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/200
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/500
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/1000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/2000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/5000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/10000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/20000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/50000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/100000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/250000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/500000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/1000000
(EagleC)
For every file in ./input/*.mcool, if cooler ls "$mcool_file" ends with 5000, 10000, or 50000 after the last /, I want to run predictSV from EagleC.
As shown on the repo, a single .mcool file
predictSV --hic-5k SKNAS-MboI-allReps-filtered.mcool::/resolutions/5000 \
--hic-10k SKNAS-MboI-allReps-filtered.mcool::/resolutions/10000 \
--hic-50k SKNAS-MboI-allReps-filtered.mcool::/resolutions/50000 \
-O SK-N-AS -g hg38 --balance-type CNV --output-format full \
--prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999
However, for a list of files, I need to write a for-loop to iteratively run predictSV
.
My attempt:
#!/usr/bin/env bash
for mcool_file in input/*.mcool; do
# iterate over ids emitted from cooler ls for this file
hic5k_num=; hic10k_num=; hic50k_num=
while IFS= read -r id; do
id_suffix=${id##*/}
case $id_suffix in
5000) hic5k_num=$id_suffix;;
10000) hic10k_num=$id_suffix;;
50000) hic50k_num=$id_suffix;;
esac
done < <(cooler ls "$mcool_file")
echo predictSV \
${hic5k_num:+ --hic-5k "$hic5k_num"} \
${hic10k_num:+ --hic-10k "$hic10k_num"} \
${hic50k_num:+ --hic-50k "$hic50k_num"} \
-g hg38 \
-O "${mcool_file%%.*}" \
--balance-type CNV \
--output-format full \
--prob-cutoff-5k 0.8 \
--prob-cutoff-10k 0.8 \
--prob-cutoff-50k 0.99999
done
However my output is:
predictSV --hic-5k 5000 --hic-10k 10000 --hic-50k 50000 -g hg38 -O input/A001C007 --balance-type CNV --output-format full --prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999
predictSV --hic-5k 5000 --hic-10k 10000 --hic-50k 50000 -g hg38 -O input/A001C008 --balance-type CNV --output-format full --prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999
Expected output:
predictSV --hic-5k ./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/500 --hic-10k ./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/10000 --hic-50k ./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/50000 -g hg38 -O input/A001C007 --balance-type CNV --output-format full --prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999
predictSV --hic-5k ./../input/A001C008.hg38.nodups.pairs.mcool::/resolutions/500 --hic-10k ./../input/A001C008.hg38.nodups.pairs.mcool::/resolutions/10000 --hic-50k ./../input/A001C008.hg38.nodups.pairs.mcool::/resolutions/50000 -g hg38 -O input/A001C008 --balance-type CNV --output-format full --prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999
Edit:
Furthermore, I want to change the -O
part from "${mcool_file%%.*}"
to the following:
EagleC_output/
+ substring after the /
parenthesis in mcool_file
file. For example, EagleC_output/A001C007
.