I have a very large text file that has over 11 million entries/lines. Each line has 35 values in it, each value is separated/delimited by a "|".
For each line that I am reading in, I am creating an object, "Record". I am storing them in a vector of Records because I need to be able to sort them based on the values in a given field. (Please suggest better approach if there is one)
I know how to override the istream>> operator, but I have never had to do it for an object this large, and I'm not sure what the best approach is. I tried to create tokens before each delimiter IE:
using namespace std;
inline istream& operator>>(istream& is, Record& r) {
string line_of_text;
string token;
char delim = '|';
is >> temp;
token = line_of_text.substr(0, line_of_text.find(delim));
r.firstField = token;
// so on for each field in Record
return is;
}
but this is very impractical and inefficient.
Is there a reasonable way of doing this for such a large object? What is the best way to parse text like this without wasting so much memory?
Example line of input:
xx|0000|0| 0.00| 3.00|111|111| 5.70| 136000.00| 620.23| 80.00| 47.00| 0.000|FIX |P|C| 80.00|Full|SF|1.|P|convention|ME| 3| | |UnReported |WFHM |2 |N| |1|0|0|0|0|0| 126162.03| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00
I also tried just doing
inline istream& operator>>(istream& is, Record& r) {
return is >> r.fieldOne >> r.fieldTwo; //....etc
}
but this does not work due to the fact that many fields are not separated with a space but just a '|', is there a graceful way to have >> skip the "|" as it does with blank spaces? Keep in mind there is a possibility for fields to be empty.