I'm a beginner in C++ so I hope you bear with me.
Trying to read a file which in text format each has lines that either look like this (the first few lines, called header lines):
@HD VN:1.5 SO:queryname
or like this
read.1 4 * 0 0 * * 0 0 CAACCNNTACCACAGCCCGANGCATTAACAACTTAANNNCNNNTNNANNNNNNNNNNNNTTGAAAAAAAAAAAAAAAAAA A<.AA##F..<F)<)FF))<#A<7<F.)FA.FAA.)###.###F##)############)FF)A<..A..7A....<F.A XC:Z:CAACCNNTACCA RG:Z:A XQ:i:2
Both are tab delimited.
The file is very large and therefore is in binary format. I'm wondering whether it is possible to read from the binary format file each line, do some processing on that line, and then write it to a binary format output file.
I started with this code:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main(int argc, char* argv[])
{
string input_file = argv[1];
string output_file = argv[2];
string line;
ifstream istream;
istream.open(input_file.c_str(),ios::binary|ios::in);
ofstream ostream;
ostream.open(output_file.c_str(),ios::binary|ios::out);
while(getline(istream,line,'\n')){
if(line.empty()) continue;
//process line assuming it is read as a string
ostream<<line<<endl;
}
istream.close();
ostream.close();
}
But it crashes with: Segmentation fault (core dumped)
, in the part where I'm trying to parse line
to a string
vector
.
Is there a way to read the binary format and split it by lines, do string processing on each such line, and then write them to a binary output?
BTW, I'm running this on Linux.