-2

I have a data file "records.txt" that has the following form:

2 100 119 107 89 125 112 121 99 124 126 123 103 128 77 85 86 115 66 117 106 75 74 76 96 93 73 109 127 110 67 65 80 
1 8 5 23 19 2 36 13 16 24 59 15 22 48 49 57 46 47 27 51 6 30 7 31 41 17 43 53 34 37 42 61 54 
2 70 122 81 83 72 82 105 88 95 108 94 114 98 102 71 104 68 113 78 120 84 97 92 116 101 90 111 91 69 118 87 79 
1 35 14 12 52 58 56 38 45 26 32 39 9 21 11 40 55 50 44 18 20 63 10 60 28 1 64 4 33 3 25 62 29 

Each line begins with either one or two, denoting which batch it belongs to. I am trying to use string stream to read in each line and store the results in a struct, with the first number corresponding to the batch number and the following 32 integers corresponding to the content, which belongs in the struct vector. I have been struggling mightily with this and I followed a solution found here: How to read line by line

The resulting program is the following:

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>

using namespace std;

const string record1("records.txt");

// declaring a struct for each record
struct record
{
    int number;             // number of record
    vector<int> content;    // content of record    
};

int main()
{
    record batch_1;         // stores integers from 1 - 64
    record batch_2;         // stores integers from 65 - 128
    record temp;
    string line;

    // read the data file
    ifstream read_record1(record1.c_str());
    if (read_record1.fail()) 
    {
        cerr << "Cannot open " << record1 << endl;
        exit(EXIT_FAILURE);
    } 
    else
        cout << "Reading data file: " << record1 << endl;

    cout << "Starting Batch 1..." << endl;
    read_record1.open(record1.c_str());
    while(getline(read_record1, line))
    {       
        stringstream S;
        S << line;              // store the line just read into the string stream
        vector<int> thisLine;   // save the numbers read into a vector
        for (int c = 0; c < 33; c++)    // WE KNOW THERE WILL BE 33 ENTRIES
        {
            S >> thisLine[c];
            cout << thisLine[c] << " ";
        }
        for (int d = 0; d < thisLine.size(); d++) 
        {
            if (d == 0)
                temp.number = thisLine[d];
            else
                temp.content.push_back(thisLine[d]);
            cout << temp.content[d] << " ";
        }   

        if (temp.number == 1) 
        {
            batch_1.content = temp.content;
            temp.content.clear();
        }

        thisLine.clear();
    }

    // DUPLICATE ABOVE FOR BATCH TWO

    return 0;
}   

The program compiles and runs with a return value of 0, but the cout statements within the loops do not execute since the only console output is:

Starting Batch 1...

In addition, if the code is duplicated for batch two I get a segmentation fault. So clearly this is not working properly. I'm not well versed with reading strings, so any help would be appreciated. Also, what would I do if the lines do not have an equivalent amount of entries (e.g. one line has 33 entries, another has 15)?

Community
  • 1
  • 1
Leigh K
  • 561
  • 6
  • 20
  • 2
    Here `S >> thisLine[c];` the value `thisline[c]` doesn't exist - you need to add elements to the vector using the `push_back` member function. –  Mar 23 '17 at 17:13
  • Or at least initialize it with `vector thisLine(33,0)` –  Mar 23 '17 at 17:21
  • I had already tried that with a negative result. Just did it again and I still end up with the same result as above....? – Leigh K Mar 23 '17 at 17:25
  • Try storing it into a separate int and then pushing that into the vector. `int temp;` / `S >> temp;` / `thisline.push_back(temp);` – picklechips Mar 23 '17 at 18:10
  • Another problem is here: `if (d == 0) temp.number = thisLine[d]; else temp.content.push_back(thisLine[d]); cout << temp.content[d] << " ";` When `d` is 0, `temp.content` is potentially still empty, so accessing `temp.content[d]` can fail. Had you used `temp.content.at(d)` instead, you would have gotten a `std::out_of_range` exception instead. – Remy Lebeau Mar 23 '17 at 18:18

1 Answers1

1

There are many problems with your code:

  1. you are opening the input file twice. Not a big deal, but not desirable, either. If you pass a filename to the std::ifstream constructor, it opens the file immediately, so there is no need to call open() afterwards.

  2. inside the first for loop of your while loop, you are trying to read integers directly into the local thisLine vector using operator>>, but that will not work correctly because you have not allocated any memory for thisLine's array yet. Since you are expecting 33 integers, you can pre-allocate the array before reading:

    vector<int> thisLine(33);
    

    Or:

    vector<int> thisLine;
    thisLine.resize(33);
    

    However, since you also ask about the possibility of separate lines having different numbers of integers, you should not pre-size the vector at all, since you don't know the number of integers yet (though you can pre-allocate the vector's capacity if you have an idea of the max number of integers you are likely to expect). You can use a while loop instead of a for loop, that way you read through the entire std::stringstream regardless of how many integers it actually holds:

    thisLine.reserve(33); // optional
    
    int c;
    while (S >> c) {
        thisLine.push_back(c);
    }
    
  3. inside the second for loop, you are accessing temp.content[d], but if d is 0 then temp.content has potentially not been populated yet, so accessing temp.content[0] will not work (and had you used temp.content.at(d) instead, you would have gotten a std::out_of_range exception). You probably meant to do something more like this instead:

    for (int d = 0; d < thisLine.size(); d++) 
    {
        if (d == 0)
            temp.number = thisLine[d];
        else {
            temp.content.push_back(thisLine[d]);
            cout << thisLine[d] << " ";
        }
    }   
    

    But even that can be simplified by removing the push_back() loop altogether:

    if (thisLine.size() > 0)
    {
        temp.number = thisLine[0];
        thisLine.erase(thisLine.begin());
    }
    temp.content = thisLine;
    
    for (int d = 0; d < thisLine.size(); d++) 
        cout << thisLine[d] << " ";
    
  4. you are looping through the entire file once, reading all records but only processing the batch 1 records. You say you have a duplicate set of loops for processing batch 2 records. That means you are going to re-read the entire file again, re-reading all of the records but ignore the batch 1 records. That is a lot of wasted overhead. You should read through the file one time, separating the batches as needed, and then you can process them when the reading loop is finished, eg:

    vector<record> batch_1;         // stores integers from 1 - 64
    vector<record> batch_2;         // stores integers from 65 - 128
    record temp;
    
    ...
    
    while(getline(read_record1, line))
    {       
        ...
        if (temp.number == 1) {
            batch_1.push_back(temp);
        } else {
            batch_2.push_back(temp);
        }
    }
    
    // process batch_1 and batch_2 as needed...
    

So, with that said, the corrected code should look more like this:

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>

using namespace std;

const string records_file("records.txt");

// declaring a struct for each record
struct record
{
    int number;             // number of record
    vector<int> content;    // content of record    
};

int main()
{
    vector<record> batch_1;         // stores integers from 1 - 64
    vector<record> batch_2;         // stores integers from 65 - 128
    record temp;
    string line;

    // read the data file
    ifstream read_records(records_file.c_str());
    if (read_records.fail()) 
    {
        cerr << "Cannot open " << records_file << endl;
        exit(EXIT_FAILURE);
    } 

    cout << "Reading data file: " << records_file << endl;

    cout << "Starting Batch 1..." << endl;

    while (getline(read_records, line))
    {       
        istringstream S(line);  // store the line just read into the string stream
        vector<int> thisLine;   // save the numbers read into a vector
        thisLine.reserve(33);   // WE KNOW THERE WILL BE 33 ENTRIES

        int c;
        while (S >> c) {
            thisLine.push_back(c);
            cout << c << " ";
        }

        temp.number = 0;
        temp.content.reserve(thisLine.size());

        for (int d = 0; d < thisLine.size(); d++) 
        {
            if (d == 0)
                temp.number = thisLine[d];
            else
                temp.content.push_back(thisLine[d]);
        }   

        /* alternatively:
        if (thisLine.size() > 0) {
            temp.number = thisLine[0];
            thisLine.erase(thisLine.begin());
        }
        temp.content = thisLine;
        */

        if (temp.number == 1) {
            batch_1.push_back(temp);
        }

        temp.content.clear();
    }

    read_records.seekg(0);

    cout << "Starting Batch 2..." << endl;

    // DUPLICATE ABOVE FOR BATCH TWO

    read_records.close();

    // process batch_1 qand batch_2 as needed...

    return 0;
}

Then you can simplify your reading loop(s) a bit by getting rid of the thisLine vector altogether:

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>

using namespace std;

const string records_file("records.txt");

// declaring a struct for each record
struct record
{
    int number;             // number of record
    vector<int> content;    // content of record    
};

int main()
{
    vector<record> batch_1;         // stores integers from 1 - 64
    vector<record> batch_2;         // stores integers from 65 - 128
    record temp;
    string line;

    // read the data file
    ifstream read_records(records_file.c_str());
    if (read_records.fail()) 
    {
        cerr << "Cannot open " << records_file << endl;
        exit(EXIT_FAILURE);
    } 

    cout << "Reading data file: " << records_file << endl;

    cout << "Starting Batch 1..." << endl;

    while (getline(read_records, line))
    {       
        istringstream S(line);  // store the line just read into the string stream
        if (S >> temp.number)
        {
            cout << temp.number << " ";

            temp.content.reserve(32);   // WE KNOW THERE WILL BE 32 ENTRIES

            int c;
            while (S >> c) {
                temp.content.push_back(c);
                cout << c << " ";
            }

            if (temp.number == 1) {
                batch_1.push_back(temp);
            }

            temp.content.clear();
        }
    }

    read_records.seekg(0);

    cout << "Starting Batch 2..." << endl;

    // DUPLICATE ABOVE FOR BATCH TWO

    read_records.close();

    // process batch_1 qand batch_2 as needed...

    return 0;
}

And then, if you are so inclined, you can greatly simplify the code even further by using std::copy() with std::istream_iterator and std::back_insertor instead:

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <iterator>
#include <algorithm>

using namespace std;

const string records_file("records.txt");

// declaring a struct for each record
struct record
{
    int number;             // number of record
    vector<int> content;    // content of record    
};

// declaring an input operator to read a single record from a stream
istream& operator>>(istream &in, record &out)
{
    out.number = 0;
    out.content.clear();

    string line;
    if (getline(in, line))
    {
        istringstream iss(line);
        if (iss >> out.number) {
            cout << out.number << " ";

            out.content.reserve(32); // WE KNOW THERE WILL BE 32 ENTRIES
            copy(istream_iterator<int>(iss), istream_iterator<int>(), back_inserter(out.content));

            for (int d = 0; d < out.content.size(); d++) 
                cout << out.content[d] << " ";
        }
    }

    return in;
}

int main()
{
    vector<record> batch_1;         // stores integers from 1 - 64
    vector<record> batch_2;         // stores integers from 65 - 128
    record temp;

    // read the data file
    ifstream read_records(records_file.c_str());
    if (!read_records) 
    {
        cerr << "Cannot open " << records_file << endl;
        exit(EXIT_FAILURE);
    } 

    cout << "Reading data file: " << records_file << endl;

    while (read_records >> temp)
    {       
        switch (temp.number)
        {
            case 1:
                batch_1.push_back(temp);
                break;

            case 2:
                batch_2.push_back(temp);
                break;
        }
    }

    read_records.close();

    // process batch_1 and batch_2 as needed...

    return 0;
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Remy, a million thanks for the very thorough response. I did a test run with your code and it definitely works, but this is going to take some time for me to digest! – Leigh K Mar 23 '17 at 18:56