First post and excited to be a part of this community.
I am a beginner and mainly use the command line for next-generation sequencing (NGS) analysis.
I have a list of files that contain data from a sequencer as follows:
[agh8423@quser12 all_fastq]$ ls Bio5* -al
-rw-r--r-- 1 agh8423 p30592 253029870 Jul 19 11:10 Bio5-H3K27ac-Dox-no_S5_L001_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 248177942 Jul 19 11:11 Bio5-H3K27ac-Dox-no_S5_L002_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 256860841 Jul 19 11:11 Bio5-H3K27ac-Dox-no_S5_L003_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 253399957 Jul 19 11:12 Bio5-H3K27ac-Dox-no_S5_L004_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 246636194 Jul 19 11:12 Bio5-H3K27ac-Dox-yes_S6_L001_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 242114964 Jul 19 11:13 Bio5-H3K27ac-Dox-yes_S6_L002_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 249862612 Jul 19 11:13 Bio5-H3K27ac-Dox-yes_S6_L003_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 247798281 Jul 19 11:14 Bio5-H3K27ac-Dox-yes_S6_L004_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 234917538 Jul 19 11:14 Bio5-H3K4me3-Dox-no_S3_L001_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 230571628 Jul 19 11:14 Bio5-H3K4me3-Dox-no_S3_L002_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 233025109 Jul 19 11:15 Bio5-H3K4me3-Dox-no_S3_L003_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 230268463 Jul 19 11:15 Bio5-H3K4me3-Dox-no_S3_L004_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 246254343 Jul 19 11:15 Bio5-H3K4me3-Dox-yes_S4_L001_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 241866406 Jul 19 11:16 Bio5-H3K4me3-Dox-yes_S4_L002_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 247044518 Jul 19 11:16 Bio5-H3K4me3-Dox-yes_S4_L003_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 243759599 Jul 19 11:17 Bio5-H3K4me3-Dox-yes_S4_L004_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 251009676 Jul 19 11:17 Bio5-Input-Dox-no_S1_L001_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 246054510 Jul 19 11:18 Bio5-Input-Dox-no_S1_L002_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 255798685 Jul 19 11:18 Bio5-Input-Dox-no_S1_L003_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 253896496 Jul 19 11:19 Bio5-Input-Dox-no_S1_L004_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 232179873 Jul 19 11:19 Bio5-Input-Dox-yes_S2_L001_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 227146014 Jul 19 11:19 Bio5-Input-Dox-yes_S2_L002_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 236543332 Jul 19 11:20 Bio5-Input-Dox-yes_S2_L003_R1_001.fastq.gz
-rw-r--r-- 1 agh8423 p30592 234698786 Jul 19 11:20 Bio5-Input-Dox-yes_S2_L004_R1_001.fastq.gz
If you notice, there are file names that are nearly identical except they differ in the "L001/2/3/4" portion of the file name. These are essentially replicate samples and for downstream processes I want to concatenate these files (but this information may not be relevant to my question)
WHAT I WANT: is to create a parent directory with the directory name being everything to the left of "_S(*)_L00(1/2/3/4)_Ri_001.fastq.gz" (so for example, the first file would have a directory named "Bio5-H3K27ac-Dox-no"). In addition to making this directory I want to put all of the files with the above file prefix (meaning all the L001/2/3/4 with the prefix name of Bio5-H3K27ac-Dox-no) into that new directory. The plan from there is to run zcat and concatenate the files into one file which would be easier to analyze.
Below is my attempt:
for file in ./*_L001_R1_001.fastq.gz.txt; do
dir=${file%_L001_R1_001.fastq.gz.txt}
mkdir -p "./$dir" &&
mv -iv "$file" "./$dir"
mv -iv "$dir"_L00* "./$dir"
done
And If I ls my directory I get the following.
[agh8423@quser11 test]$ ls -al
total 36
drwxrwsr-x 8 agh8423 p30592 4096 Jul 22 18:27 .
drwxrwsr-x 3 agh8423 p30592 32768 Jul 22 17:27 ..
drwxrwsr-x 2 agh8423 p30592 4096 Jul 22 18:27 Bio1-Input-Dox-no_S12
drwxrwsr-x 2 agh8423 p30592 4096 Jul 22 18:27 Bio1-Input-Dox-yes_S11
drwxrwsr-x 2 agh8423 p30592 4096 Jul 22 18:27 Bio1-MYC-Dox-no_S2
drwxrwsr-x 2 agh8423 p30592 4096 Jul 22 18:27 Bio1-MYC-Dox-yes_S3
drwxrwsr-x 2 agh8423 p30592 4096 Jul 22 18:27 Bio1-WDR5-Dox-no_S5
drwxrwsr-x 2 agh8423 p30592 4096 Jul 22 18:27 Bio1-WDR5-Dox-yes_S10
-rwxrwxr-x 1 agh8423 p30592 178 Jul 22 18:29 test1.sh
The part that I don't want is the _S12 etc. at the end of the directory name but I want it to remain in the file names that were moved to the new directories.
-Austin