3

Thank you for taking the time to look at this.

I have a fastq file and I want to translate it to the complementary, but not the reverse complementary, something like this:

@Some header example:1:
ACTGAGACTCGATCA
+
S0m3_Qu4l1t13s&

Translated to

@Some header example:1:
TGACTCTGAGCTAGT
+
S0m3_Qu4l1t13s&

And the code I used is:

awk '{
  if(NR==100000){break} 
  else if((NR+2) % 4 ==0 ){ system("echo " $0 "| tr ATGC TACG") }
  else print $0}' MyFastqFyle.fastq > MyDesiredFile.fastq

And it works! but this approach is slooooooooow, even with small files (250M). I wonder which other way will get this done faster, doesn't matter if this is in R or bash or similar.

(I looked at BioStrings But I only found the reverse complimentary function, and there are some issues with the "@" in the header instead of the ">")

Wintermute
  • 42,983
  • 5
  • 77
  • 80
Edahi
  • 59
  • 7

2 Answers2

3

This is slow because you spawn a shell and a process in it for every changed line. Just do it with sed:

sed '2~4 y/ATGC/TACG/' MyFastqFyle.fastq > MyDesiredFile.fastq

This requires GNU sed, so I hope you're not on Mac OS X. If you are,

sed 'n; y/ATGC/TACG/; n; n' MyFastqFyle.fastq > MyDesiredFile.fastq

should work.

Wintermute
  • 42,983
  • 5
  • 77
  • 80
  • Thanks! That's it. I wanted to accept this answer but I need to wait 6 more minutes, ha – Edahi Apr 08 '15 at 21:31
1

Here is the solution using Biostrings (and ShortRead):

library(ShortRead)
raw <- sread(readFastq("MyFastqFyle.fastq"))
complemented <- complement(raw)
Michael Lawrence
  • 1,031
  • 5
  • 6