C++ Reading population records from a file

Question

I am trying to write a C++ program that takes a list of population records from a file an exert is given below:

Jackson 49292
Levy 40156
Indian River 138894
Liberty 8314
Holmes 19873
Madison 19115

Is there a good way to handle the Indian River case? This is my current code I have written:

 ifstream file("county_data-5.txt");
  if(!file)
  {
    cout<<"\nError: File not found!\n";
  }
  else
  {
    string name ,string;
    double pop;
    while(!file.eof())
    {
      if(!file.eof())
      {
        //file>>name;
        //file>>pop;
        getline(file, string);
        stringstream ss(string);
        ss >> name >> pop;
        insert(root,name, pop);
      }
    }
  }
  file.close();

To start with, your usage of `!file.eof()` is [wrong](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) and you should check if reading is successful **before** trying to use what is read. — MikeCAT, Aug 02 '20 at 03:03
I recommend that you read full lines, and then attempt to parse them. For example you could find the last space, and divide the line into two sub-strings at that position. — Some programmer dude, Aug 02 '20 at 03:10
If possible can you use a different delimiter in the text file for separating columns, like comma? — kiner_shah, Aug 02 '20 at 04:18
You should explain what the current code does and whether that is right, and if not, why not. — underscore_d, Aug 03 '20 at 09:25

David C. Rankin · Answer 1 · 2020-08-02T04:31:07.940

There are many, many ways to handle reading a name that can have an unknown number of whitespace separated parts and a trailing number. You can trivially do it with cstdio and reading each line with getline() and then calling sscanf() on the str.c_str() with a format string of " %[^0-9] %zu" and then trim the trailing whitespace from temporary_name before assigning to a string.

Staying with the current era C++, you can read the line with getline and then use the .find_first_of() member function to locate the first digit in the string. For example, you can keep a list of digits, e.g. const char *digits = "0123456789"; and then locate the first digit with line.find_first_of(digits);. Knowing where the first digit is, you can then use the .substr() member function to copy the name and then strip the trailing whitespace from the end.

The larger consideration is how to store all of the values read. If you create a simple struct that has members std:string name; and size_t pop; you can then create a std::vector of struct and just add each struct worth of data read from the file using the .push_back() member function to add a new struct to the vector of struct.

A simple implementation of the struct could be:

struct population
{
    std::string name;
    size_t pop;
    
    /* constructors */
    population() { name = ""; pop = 0; }
    population(const std::string& n, const size_t p) : name(n), pop(p) {}
};

To simplify the read from the file, you can create an overload of >> that will read a line of data from the open file stream and do the separation into name and pop for you. A second overload of << will allow you to output the struct in a sane format of your choosing. Adding the overloads you would have:

/* struct to hold name population, 
 * and overloads of operators >> and << to facilitate splitting name/hours.
 */
struct population
{
    std::string name;
    size_t pop;
    
    /* constructors */
    population() { name = ""; pop = 0; }
    population(const std::string& n, const size_t p) : name(n), pop(p) {}
    
    /* overloads of >> (separates name/pop) and << (outputs name/pop) */
    friend std::istream& operator >> (std::istream& is, population& p) {
        const char *digits = "0123456789";
        
        std::string line {};
        if (getline (is, line)) {                               /* read line */
            size_t popbegin = line.find_first_of(digits);       /* find 1st [0-9] */
            if (popbegin != std::string::npos) {                /* valdiate found */
                std::string tmp = line.substr(0, popbegin);     /* get name */
                while (isspace(tmp.back()))                     /* remove trailing */
                    tmp.pop_back();                             /* .. spaces */
                p.name = tmp;                                   /* assign to name */
                p.pop = stoul(line.substr(popbegin));           /* assign to pop */
            }
        }
        return is;
    }
    friend std::ostream& operator << (std::ostream& os, const population& p) {
        os << std::left << std::setw(32) << p.name << "  " << p.pop << '\n';
        return os;
    }
};

Then all you need in main() is to validate you have a filename passed as an argument, open the file and validate it is open for reading (say std::ifstream f) and then your read and separation of values is reduced to a single trivial loop:

    population p {};    /* instance of population struct to facilitate read from file */
    std::vector<population> records {};         /* vector of population */
    
    while (f >> p) {                            /* read population data from file */
        records.push_back(p);                   /* add to population vector */
    }

Now you have all locations and the populations for each stored in the vector of struct records. Putting it altogether, you could do:

#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <vector>

/* struct to hold name population,  
 * and overloads of operators >> and << to facilitate splitting name/hours.
 */
struct population
{
    std::string name;
    size_t pop;
    
    /* constructors */
    population() { name = ""; pop = 0; }
    population(const std::string& n, const size_t p) : name(n), pop(p) {}
    
    /* overloads of >> (separates name/pop) and << (outputs name/pop) */
    friend std::istream& operator >> (std::istream& is, population& p) {
        const char *digits = "0123456789";
        
        std::string line {};
        if (getline (is, line)) {                               /* read line */
            size_t popbegin = line.find_first_of(digits);       /* find 1st [0-9] */
            if (popbegin != std::string::npos) {                /* valdiate found */
                std::string tmp = line.substr(0, popbegin);     /* get name */
                while (isspace(tmp.back()))                     /* remove trailing */
                    tmp.pop_back();                             /* .. spaces */
                p.name = tmp;                                   /* assign to name */
                p.pop = stoul(line.substr(popbegin));           /* assign to pop */
            }
        }
        return is;
    }
    friend std::ostream& operator << (std::ostream& os, const population& p) {
        os << std::left << std::setw(32) << p.name << "  " << p.pop << '\n';
        return os;
    }
};

int main (int argc, char **argv) {
    
    if (argc < 2) { /* validate 1 argument given for filename */
        std::cerr << "error: filename required as 1st argument.\n";
        return 1;
    }
    
    std::ifstream f (argv[1]);  /* open filename provided as 1st argument */
    
    if (!f.is_open()) { /* validate file is open for reading */
        std::cerr << "file open failed: " << argv[1] << '\n';
        return 1;
    }
    
    population p {};    /* instance of population struct to facilitate read from file */
    std::vector<population> records {};         /* vector of population */
    
    while (f >> p) {                            /* read population data from file */
        records.push_back(p);                   /* add to population vector */
    }
    
    for (const auto& loc : records)             /* output results */
        std::cout << std::left << std::setw(32) << loc.name << loc.pop << '\n';
}

Example Use/Output

With your data in the file dat/population.txt, the use and results would be:

$ ./bin/poprecords dat/population.txt
Jackson                         49292
Levy                            40156
Indian River                    138894
Liberty                         8314
Holmes                          19873
Madison                         19115

And since you have the data stored in a vector of struct, you can sort the vector any way you like to analyze your data.

This is just one of many ways to approach the problem. Look things over and let me know if you have further questions.

score 1 · Answer 2 · answered Aug 03 '20 at 09:24

I would like to show an additional solution, using more modern C++ elements. And, I will use a regex to describe, what is valid input and not.

With a regex you can define in detail, what is allowed or not. We can be very strict or allow leading and trailing spaces, or more then one space or any whitespace character or whatever we wish. So, even if you have a county name like Holmes Region 1 19873, we can treat it as valid and extract the correct data.

I am not sure, if you understand regurlar expressions. Anyway. I will define now a regex for your data. The whole regex is:

^\s*(\w+(\s+\w+)*)\s+(\d+)\s*$
1      Begin of line
\s*    Zero or more white spaces
(      Begin of a group. Later we will extract this groupd data (the county name)
\w+    One or more characters, a-z, A-Z and _ (First county sub name)
(      Begin of optional group for county names with more sub names
\s+    One or more whit spaces between county sub names
\w+    One or more characters, a-z, A-Z and _ (additional county sub names)
)      ENd of group for additional county subnames (always having starting white spaces)
*      There may be 0 or more additionaly sub names for county
\s+    One or more white spaces (in front of population count)
(      Start of group for population count. Will be extracted later
\d+    One or more digits (So, we will make sure that this is a valid number)
)      End of Group for digits
\s*    0 or more white spaces
$      End of line

So , you see, that we can define a regex for our specified purpose.

As for the rest of the program structure, everything is more a less a standard approach.

Important. In C++ we put data and the corresponding methods in a class. This inlcudes IO functions. so, the extractor operator and the inserter operator. Only the class should know, how to read and writes its data.

Therefore we will simply define a class "CountyPopulation" with just 2 data members and override the extractor and inserter operator.

In the extractor, we will read a complete line and match it against our regex. If it matches, then we can extract our needed 2 groups. Simple.

For the driver code. We will open the source file and check, if it could be opened. Then, we define a std::vetcor using CTAD and use its range constructor to fill it. The range constructor expects 2 iterators. And for this we use the std::istream_iterator. The whole construct will simple call the extractor operator of our class for all lines in the source line.

This leads to a one-line for reading the complete file into our std::vetcor.

Please see:

`#include <iostream>
#include <fstream>
#include <string>
#include <regex>
#include <algorithm>
#include <iomanip>

struct CountyPopulation {
    // Our Data
    std::string county{};
    unsigned long population{};

    // Overwrite extractor
    friend std::istream& operator >> (std::istream& is, CountyPopulation& cp) {
        // Read a complete line
        if (std::string line{}; std::getline(is, line)) {

            // We want to evaluate the string using a regular expression
            std::smatch sm; std::regex re{ R"(^\s*(\w+(\s+\w+)*)\s+(\d+)\s*$)" };

            // If the string matches our patternm, then we can copy the data
            if (std::regex_match(line, sm, re)) {
                cp.county = sm[1];
                cp.population = std::stoul(sm[3]);
            }
            else std::cerr << "\n*** Error: Invalid Data in line:  '" << line << "'\n";
        }
        return is;
    }
    // Overwrite inserter
    friend std::ostream& operator << (std::ostream& os, const CountyPopulation& cp) {
        return os << std::left << std::setw(30) << cp.county << " --> " << cp.population << '\n';
    }
};

int main() {
    // Open file and check, if it could be opened
    if (std::ifstream countyFileStream{ "r:\\county_data-5.txt" }; countyFileStream) {

        // Define a vector and use its range constructor to read all values from the file
        std::vector population(std::istream_iterator<CountyPopulation>(countyFileStream), {});

        // Show all read data on screen
        std::copy(population.begin(), population.end(), std::ostream_iterator<CountyPopulation>(std::cout));
    }
    else std::cerr << "\n*** Error: Could not open source file\n";
    return 0;
}

Compiled and tested with C++17

score 0 · Answer 3 · answered Aug 02 '20 at 03:05

0

As MikeCAT points out in his comment, while(!file.oef()) isn't right. Instead, you can simply do while(file). If these are population records, you can use int instead of double. Also, the if statement is unnecessary as you already have a while loop. You should also change the name of your string to prevent confusion.

answered Aug 02 '20 at 03:05

Answer for the main problem (how to deal with `Indian River 138894` case) doesn't seem in this answer. Also using `int` may not be suitable. What if the unit of population data is "thousand people"? – MikeCAT Aug 02 '20 at 03:08
I don't think it will be like that, and also I'm not sure how stringstream works, so you might want to traverse the string backwards until you hit a space. – Aug 02 '20 at 03:10

C++ Reading population records from a file

3 Answers3