I have two files (recode and reads) that were built and saved with nano command and I want to compare what has on recode to reads and extract the lines in reads that overlaps. I have been trying to create a when loop with the previous logic on mind, but without success so far. The output data is not matching with the pattern specified in the loop while
with grep/recode. The script was supposed to read each line in recode.txt compare to reads.fastq, extract each match line plus one line before and 2 after in the reads.txt and save the output in different files (for all combined match lines per line of the recode.txt). Here are the tables and code:
File recode.txt
:
GTGTCTTA+ATCACGAC
GTGTCTTA+ACAGTGGT
GTGTCTTA+CAGATCCA
GTGTCTTA+ACAAACGG
GTGTCTTA+ACCCAGCA
GTGTCTTA+AACCCCTC
GTGTCTTA+CCCAACCT
ATCACGAC+AAGGTTCA
GTGTCTTA+GAAACCCA
File reads.fastq
:
###################################
@NB500931:113:HW53WBGX2:1:11101:11338:1049 1:N:0:ATCACGAC+AAGGTTCA
GTAGTNCCAGCTGCAGAGCTGGAAGGATCGCTTGAGCGCAGAGGTAGAGGCTACAGTGAGCCGTGATCATGCCAT
+
AAAAA#EAAEEEEE6EAEAEEEEEEEEEEEEEEEAEEEEEE/EEEEEEEEEE/EEEEEEEEEEEEEEEAEEEEEA
@NB500931:113:HW53WBGX2:1:11101:6116:1049 1:N:0:ACAAACGG+AAGGTTCA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
###################################
@NB500931:113:HW53WBGX2:1:11101:6885:1049 1:N:0:ACCCAGCA+ACTTAGCA
GAGGGNGCTGTCCCAGTAATTGGGTTCAGATGACATTTGCTTGATTTTAGGGATGTACGAGATTTTCGTGGATC
+
AAA/A#EAEEEEEAEAEEA///EEAEEEEE///AEEAEE/AA//EAA<EEE/E//AEEEAAA//E/A<6//EEA
@NB500931:113:HW53WBGX2:1:11101:8246:1049 1:N:0:ATCACGAC+AAGGTTCA
CTTGTNAGACACGATGCAGAGAATTAGCTGTTTGATGCCTATCTTCCCAACTCAGAGGCAAGCTGCCCAAAGGC
+
Script:
#!/bin/bash
#PBS -l nodes=1:ppn=8,walltime=96:00:00
while read line
do
echo "working on $line"
grep -A3 "$line" reads.fastq | grep -v "^--$" >> "$line"_sorted.fastq
done<recode.txt
So, both files are in UNIX format and the following script (without a loop) works smooth
According to the script without the looping:
grep -A3 "ATCACGAC+AAGGTTCA" reads.fastq | grep -v "^--$" > sorted_file.fastq
my output should be:
@NB500931:113:HW53WBGX2:1:11101:11338:1049 1:N:0:ATCACGAC+AAGGTTCA
GTAGTNCCAGCTGCAGAGCTGGAAGGATCGCTTGAGCGCAGAGGTAGAGGCTACAGTGAGCCGTGATCATGCCAT
+
@NB500931:113:HW53WBGX2:1:11101:8246:1049 1:N:0:ATCACGAC+AAGGTTCA
CTTGTNAGACACGATGCAGAGAATTAGCTGTTTGATGCCTATCTTCCCAACTCAGAGGCAAGCTGCCCAAAGGC
+
However, my output using the loop while
give me a empty file with the correct name. Can you please help me?
UPDATE: I have tried dos2unix to convert my files and it didn't work. UPDATE: I edited the question to include my expected output