I'm new to the forum, but not to this website. I've been searching for weeks on how to process a large data file quickly using C++ 11. I'm trying to have a function with a member that will capture the trace file name, open and process the data. The trace file contains 2 million lines of data, and each line is structured with a read/write operation and a hex address:
r abcdef123456
However, with a file having that much data, I need to read in and parse those 2 values quickly. My first attempt to read the file was the following:
void getTraceData(string filename)
{
ifstream inputfile;
string file_str;
vector<string> op, addr;
// Open input file
inputfile.open(filename.c_str());
cout << "Opening file for reading: " << filename << endl;
// Determine if file opened successfully
if(inputfile.fail())
{
cout << "Text file failed to open." << endl;
cout << "Please check file name and path." << endl;
exit(1);
}
// Retrieve and store address values and operations
if(inputfile.is_open())
{
cout << "Text file opened successfully." << endl;
while(inputfile >> file_str)
{
if((file_str == "r") || (file_str == "w"))
{
op.push_back(file_str);
}
else
{
addr.push_back(file_str);
}
}
}
inputfile.close();
cout << "File closed." << endl;
}
It worked, it ran, and read in the file. Unfortunately, it took the program 8 minutes to run and read the file. I modified the first program to the second program, to try and read the file in faster. It did, reading the file into a buffer in a fraction of a second versus 8 mins. using ifstream:
void getTraceData()
{
// Setup variables
char* fbuffer;
ifstream ifs("text.txt");
long int length;
clock_t start, end;
// Start timer + get file length
start = clock();
ifs.seekg(0, ifs.end);
length = ifs.tellg();
ifs.seekg(0, ifs.beg);
// Setup buffer to read & store file data
fbuffer = new char[length];
ifs.read(fbuffer, length);
ifs.close();
end = clock();
float diff((float)end - (float)start);
float seconds = diff / CLOCKS_PER_SEC;
cout << "Run time: " << seconds << " seconds" << endl;
delete[] fbuffer;
}
But when I added the parsing portion of the code, to get each line, and parsing the buffer contents line-by-line to store the two values in two separate variables, the program silently exits at the while-loop containing getline from the buffer:
void getTraceData(string filename)
{
// Setup variables
char* fbuffer;
ifstream ifs("text.txt");
long int length;
string op, addr, line;
clock_t start, end;
// Start timer + get file length
start = clock();
ifs.seekg(0, ifs.end);
length = ifs.tellg();
ifs.seekg(0, ifs.beg);
// Setup buffer to read & store file data
fbuffer = new char[length];
ifs.read(fbuffer, length);
ifs.close();
// Setup stream buffer
const int maxline = 20;
char* lbuffer;
stringstream ss;
// Parse buffer data line-by-line
while(ss.getline(lbuffer, length))
{
while(getline(ss, line))
{
ss >> op >> addr;
}
ss.ignore( strlen(lbuffer));
}
end = clock();
float diff((float)end - (float)start);
float seconds = diff / CLOCKS_PER_SEC;
cout << "Run time: " << seconds << " seconds" << endl;
delete[] fbuffer;
delete[] lbuffer;
}
I was wondering, once my file is read into a buffer, how do I retrieve it and store it into variables? For added value, my benchmark time is under 2 mins. to read and process the data file. But right now, I'm just focused on the input file, and not the rest of my program or the machine it runs on (the code is portable to other machines). The language is C++ 11 and the OS is a Linux computer. Sorry for the long posting.