0

I have a file that I need to read in. Each line of the file is exceedingly long, so I'd rather not read each line to a temporary string and then manipulate those strings (unless this isn't actually inefficient - I could be wrong). Each line of the file contains a string of triplets - two numbers and a complex number, separated by a colon (as opposed to a comma, which is used in the complex number). My current code goes something like this:

 while (states.eof() == 0)
  {
  std::istringstream complexString;

  getline(states, tmp_str, ':');
  tmp_triplet.row() = stoi(tmp_str);
  getline(states, tmp_str, ':');
  tmp_triplet.col() = stoi(tmp_str);
  getline(states, tmp_str, ':');
  complexString.str (tmp_str);
  complexString >> tmp_triplet.value();
  // Then something useful done with the triplet before moving onto the next one
  }

tmp_triplet is a variable that stores these three numbers. I want some way to run a function every line (specifically, the triplets in every line are pushed into a vector, and each line in the file denotes a different vector). I'm sure there's an easy way to go about this, but I just want a way to check whether the end of the line has been reached, and to run a function when this is the case.

Henry Shackleton
  • 351
  • 2
  • 11
  • Possible duplicate of [Read file line by line](https://stackoverflow.com/questions/7868936/read-file-line-by-line) – Marc Jan 03 '18 at 17:01
  • You don't have to read the entire line. What you can is to read a chunk, say 4kb and then parse it, i.e. search for `:`. Once you are done with the chunk you keep the rest of the chunk and append the next one to the end of the leftover. If you encounter EOL then you do your processing on the vector. Also whether reading whole line is inefficient actually depends on how much RAM you have and how long lines are. If each line is up to say 10Mb then I wouldn't bother with chunking algos and I would read entire thing at once. So how big are these lines? – freakish Jan 03 '18 at 17:02
  • @freakish, each line is about 1Mb. So it sounds like it's fine to just read the whole thing to a string? – Henry Shackleton Jan 03 '18 at 17:20
  • @wayward_vagabound Well, I assume we are talking about Gbs of RAM available? If that's the case then yeah, it's fine. 1Mb is like nothing to modern computers. – freakish Jan 03 '18 at 17:22
  • Also here's a more general advice: go for correctness in the first place. Then simplicity. And finally performance. Premature optimization is the root of all evil. – freakish Jan 03 '18 at 17:24

1 Answers1

0

When trying to plan stuff out, abstraction can be your best friend. If you break down what you want to do by abstract functionality, you can more easily decide what data types should be used and how different data types should be planned out, and often you can find some functions almost write themselves. And typically, your code will be more modular (almost by definition), which will make it easy to reuse, maintain, and adapt if future changes are needed.

For example, it sounds like you want to parse a file. So that should be a function.

To do that function, you want to read in the file lines then process the file lines. So you can make two functions, one for each of those actions, and just call the functions.

To read in file lines you just want to take a file stream, and return a collection of strings for each line.

To process file lines you want to take a collection of strings and for each one parse the string into a triplet value. So you can create a method that takes a string and breaks it into a triplet, and just use that method here.

To process a string you just need to take a string and assign the first part as the row, the second part as the column, and the third part as the value.

struct TripletValue
{
    int Row;
    int Col;
    int Val;
};

std::vector<TripletValue> ParseFile(std::istream& inputStream)
{
    std::vector<std::string> fileLines = ReadFileLines(inputStream);
    std::vector<TripletValue> parsedValues = GetValuesFromData(fileLines);
    return parsedValues;
}

std::vector<std::string> ReadFileLines(std::istream& inputStream)
{
    std::vector<std::string> fileLines;
    while (!inputStream.eof())
    {
        std::string fileLine;
        getline(inputStream, fileLine);
        fileLines.push_back(fileLine);
    }
    return fileLines;
}

std::vector<TripletValue> GetValuesFromData(std::vector<std::string> data)
{
    std::vector<TripletValue> values;

    for (int i = 0; i < data.size(); i++)
    {
        TripletValue parsedValue = ParseLine(data[i]);
        values.push_back(parsedValue);
    }

    return values;
}

TripletValue ParseLine(std::string fileLine)
{
    std::stringstream sstream;
    sstream << fileLine;

    TripletValue parsedValue;

    std::string strValue;
    sstream >> strValue;
    parsedValue.Row = stoi(strValue);

    sstream >> strValue;
    parsedValue.Col = stoi(strValue);

    sstream >> strValue;
    parsedValue.Val = stoi(strValue);

    return parsedValue;
}
  • std::streams should be passed by reference, otherwise you are calling a deleted function. – Costantino Grana Jan 03 '18 at 17:36
  • I was just trying to show the high level thought process of breaking bigger problems into smaller pieces using abstraction, and how doing so can help piece together how to implement the small details. –  Jan 03 '18 at 17:49