1

I am trying to edit a fastq file, which is simply a text file to store DNA or RNA reads.

In the file, I am simply editing the '@' to 'A', 'B' to 'C', etc as shown in the code, and i am writing the changed sequence to the new file.

But, in the new file, some non printable characters like '^F' , '^B', etc are being introduced instead of the newline character. This was only done at a few places and not in all places, that's why I am not sure why this is happening.

#include <bits/stdc++.h>
#include <fstream>
using namespace std;

int main()
{
    ifstream in;
    ofstream out;
    in.open("file1.fq");
    out.open("newfile1.fq",ios::out|ios::app|ios::ate);
    while(!in.eof())
    {
        string head,plus,seq,qs;
        in>>head>>seq>>plus>>qs;
        if(head[0]!='@')
            continue;
        out<<head<<endl;
        for(int i=0;i<seq.size();i++)
        {
            if(seq[i]=='@')
                seq[i] = 'A';
            else if(seq[i]=='B')
                seq[i] = 'C';
            else if(seq[i] =='F')
                seq[i] = 'G';
            else if(seq[i]=='S')
                seq[i] = 'T';
        }
        out<<seq<<endl;
        out<<"+"<<endl;
        out<<qs<<endl;
    }
    in.close();
    out.close();
}

In between, some non printable characters are introduced in the new file like '^B' , '^F', etc, which are not present in the input file.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Amit
  • 11
  • 1
  • Your while loop condition is incorrect, and you are going round your loop one too many times and operating with garbage strings. Change your loop to the correct `while (in >> head >> seq >> plus >> qs) { if head[0] == '@') ...` and I expect the problem will go away. – john Jun 12 '19 at 07:40
  • This could explain unnecessary output, but not non-printable characters. – Radosław Cybulski Jun 12 '19 at 07:41
  • @RadosławCybulski You're right, I'll reopen. – john Jun 12 '19 at 07:43
  • Amit, you need to post full source code. As far as i see the code you posted is fine (except the while error @john posted about). – Radosław Cybulski Jun 12 '19 at 07:45
  • @Amit Fix the while loop first and then see what happens. – john Jun 12 '19 at 07:45
  • [Why should I not `#include `?](https://stackoverflow.com/q/31816095) [Why is `using namespace std;` considered bad practice?](https://stackoverflow.com/q/1452721) – L. F. Jun 12 '19 at 10:39
  • @john ya i agree with you but since the lines were in multiple of 4 that's why i wrote like this , anyhow your method is more accurate , but i am not having this problem , the problem is that when i write onto a file some characters are getting replaced with some random characters like 'HISEQ' becomes 'HIDEQ' which i am not able to comprehend why this is happening that too only at some random places . I am using centos 7 and the files that i have been working are more than 10 gb . – Amit Jun 14 '19 at 10:12

1 Answers1

0

The question is too old but still I want to answer my question. The problem was not with the code but it was with the hardware. The new Ram that was installed had some problems because of which non printable characters were introduced when processing large text files through this code.

Amit
  • 11
  • 1