I have list of files where for each file there are two set of files forward and reverse.
KIMS2021-01_R1.fastq.gz KIMS2021-05_R2.fastq.gz SRR1734377_1.fastq.gz SRR6006898_2.fastq.gz SRR6006903_1.fastq.gz
KIMS2021-01_R2.fastq.gz KIMS2021-06_R1.fastq.gz SRR1734377_2.fastq.gz SRR6006899_1.fastq.gz SRR6006903_2.fastq.gz
KIMS2021-02_R1.fastq.gz KIMS2021-06_R2.fastq.gz SRR6006895_1.fastq.gz SRR6006899_2.fastq.gz SRR6006904_1.fastq.gz
KIMS2021-02_R2.fastq.gz SRR1734374_1.fastq.gz SRR6006895_2.fastq.gz SRR6006900_1.fastq.gz SRR6006904_2.fastq.gz
KIMS2021-03_R1.fastq.gz SRR1734374_2.fastq.gz SRR6006896_1.fastq.gz SRR6006900_2.fastq.gz SRR6006905_1.fastq.gz
KIMS2021-03_R2.fastq.gz SRR1734375_1.fastq.gz SRR6006896_2.fastq.gz SRR6006901_1.fastq.gz SRR6006905_2.fastq.gz
KIMS2021-04_R1.fastq.gz SRR1734375_2.fastq.gz SRR6006897_1.fastq.gz SRR6006901_2.fastq.gz SRR6006906_1.fastq.gz
KIMS2021-04_R2.fastq.gz SRR1734376_1.fastq.gz SRR6006897_2.fastq.gz SRR6006902_1.fastq.gz SRR6006906_2.fastq.gz
KIMS2021-05_R1.fastq.gz SRR1734376_2.fastq.gz SRR6006898_1.fastq.gz SRR6006902_2.fastq.gz
My objective is to pass these files for input which simple when all the files are having similar naming pattern followed here i have data coming from two different sources..
This is the command i do run
for i in $(ls *.fastq*.gz | sed 's/00[0-9]\.gz/.gz/' | rev | cut -c 17- | rev | uniq); do STAR --runMode alignReads --outSAMtype BAM SortedByCoordinate --runThreadN 30 --genomeDir /run/media/punit/data3/Santosh_star_index --readFilesIn <(gunzip -c ${i}_R1_001.fastq.gz ${i}_R2_001.fastq.gz ) --outFileNamePrefix ${i%};done
The idea is I should get one file name for each of the set.
This command works for the files which starts with SRR ids, as i have tried
for i in $(ls *.fastq*.gz | sed 's/00[0-9]\.gz/.gz/' | rev | cut -c 12- | rev | uniq); do echo $i; done
The output of the above is as such
KIMS2021-01_
KIMS2021-02_
KIMS2021-03_
KIMS2021-04_
KIMS2021-05_
KIMS2021-06_
SRR1734374
SRR1734375
SRR1734376
SRR1734377
SRR6006895
SRR6006896
SRR6006897
SRR6006898
SRR6006899
SRR6006900
SRR6006901
SRR6006902
SRR6006903
SRR6006904
SRR6006905
Here i can see the SRR id become unique where as the KIIMS are not. So any suggestion or help how do i make them similar pattern to run it once.
The naive way is to run them as two different sets rather one but i would like to learn how to do when there are different kind or different length of naming
UPDATE
This code does what i want that is uniform name
for i in $(echo *.fastq*.gz); do echo ${i%_*}; done | uniq
Now i want to use it with rest of my command
do STAR --runMode alignReads --outSAMtype BAM SortedByCoordinate --runThreadN 30 --genomeDir /run/media/punit/data3/Santosh_star_index --readFilesIn <(gunzip -c ${i}_R1_001.fastq.gz ${i}_R2_001.fastq.gz ) --outFileNamePrefix ${i%};done
Now my issue is with i have 2 do
that wont work but How do i pipe the name to the command