0

I'm trying to read in a file with personal information. Each line contains the data of one person, let's say it looks like this:

First(s) Last ID SSN
Peter Barker 1234 5678
James Herbert Bond 007 999
Barack Hussein Obama 2007 14165

So I want to use std::copy to read each line and copy it in a (std::vector<Person>) like this:

struct Person
{
    std::string firstName_s;
    std::string lastName;
    int ID;
    int SSD;
}

I thought that it would be handy to overload the extraction operator for this:

std::istringstream& operator>>(std::istringstream& in, struct Person& person)
{
    struct Person tmp;
    in  >> tmp.firstName_s 
        >> tmp.lastName 
        >> tmp.ID 
        >> tmp.SSN;
    person = std::move(tmp);
    return in;
}

However, the problem I'm having is that I do not know how many first names the person will have.

I thought about reading the full name into one string until I encounter a number and them split the last name from the string containing the first name(s), this worked fine but looks 'ugly'. It would be great if someone had a better suggestion, or a link I could look at, I can't seem to find something on my own! Thank you.

picklepick
  • 1,370
  • 1
  • 11
  • 24
  • FWIW, ID and SSN should also be strings which makes this problem a lot easier. An id like `007` stored in an int would just be `7`. If you want to keep the leading zeros you need a string. – NathanOliver Jan 10 '20 at 13:30
  • 2
    You could read the whole string, split them by spaces, and parse them in reverse. Whatever is left after filling out all other fields will be the first name(s) in this case. – GoodDeeds Jan 10 '20 at 13:31
  • @NathanOliver Good point, I do not have leading 0s. Just like the example but I'll fix it. I can make everything a string. But that doesn't solve the initial problem – picklepick Jan 10 '20 at 13:33
  • 1
    There is no way to detect the name is the last without actually reading it, so all you can do is keep reading first names until you detect a not-name. In `C`, if your rule is "the first character is numeric" then you could use get and unget, but unget is not exposed in streams (it is effectively used internally) – Gem Taylor Jan 10 '20 at 13:34
  • @GoodDeeds Thanks for the suggestion, I'll try that. – picklepick Jan 10 '20 at 13:35
  • 1
    Another option, if you can change the input format, is to require that arguments with spaces in are quoted. This then allows you to handle German last names such as "von Richthofen". Creating a `>>` overload for optionally quoted strings is not very messy, and is encapsulated at the right level. Also consider supporting `\ ` or quoted space, i.e. `von\ Richthofen`. – Gem Taylor Jan 10 '20 at 13:42
  • Even more options: Have the user enter the first names and last names separated by `-`s. Then, you can simply `std::cin >> person.first_names;` and `std::cin >> person.last_name`. – Jaideep Shekhar Jan 10 '20 at 13:46
  • 1
    Please don't prefix `struct` to type names outside class definitions and forward-declarations. Using it in a variable declaration such as `struct Person tmp;` is redundant at best and leads to unexpected behavior at worst, because it will declare a class if a class with the name `Person` is not found by name lookup. – walnut Jan 10 '20 at 13:51
  • @walnut thanks, I didn't know that – picklepick Jan 10 '20 at 15:46
  • @JaideepShekhar I cannot change the file, unfortunately :( – picklepick Jan 10 '20 at 15:46

2 Answers2

2

The OP does not want to change the file, so this answer is no longer suitable. Work in progress.

There are two broad strategies here

We can choose to:

  • Have the user format the data for us.
  • Have the program format the data.

And then we parse it.

1) Fixed data format:

The easiest solution.
Make the user enter data in a format that can easily be parsed. Here, it is up to the user to enter the data correctly, and program will check if the data entered is correct. This can be done in many ways, including, but not in entirety:

Peter-Richmond Barker 1234 5678        // hyphen(-) separated
"James Herbert" Bond 007 999           // enclosed in quotes("")
Barack_Hussein Obama 2007 14165        // underscore(_) separated

In the first and third case, std::cin >> person.first_names is enough.
In the second case, you would have to

std::getline(std::cin, person.first_names, '\"'); // any character delimiter

it, after checking for the opening delimiter with std::cin.get() == '\"'.

2) Slow and Steady

Another very easy solution. Just make the user enter one thing at a time:

std::cout << "Enter some datum 1: ";
std::cin >> person.some_datum_1;
...

(Contrary to popular imagination, datum is singular, and data is plural).
For multiple inputs, see line tokenisation:

Let me grab a method here:

std::cout << "Enter some data 1: ";
// Grab the line and put into a stream
std::getline(std::cin, line);
std::stringstream line_buffer(line);

// Prepare to iterate over the stream
std::istream_iterator<std::string> it(line_buffer);
std::istream_iterator<std::string> end;

// Set the name with a move assignment operator
person.first_names = std::move(std::vector<std::string>(it, end));
...

Note that this method will require person.first_names to be a std::vector<std::string>.

3) Start from The End

Here, we input the undetermined size data first.

Warning: It will work only for a single undetermined size input. If both the first and last names can be more than two, this wouldn't work. I mention it only for completeness.

If you don't want to coerce the user into this and spoil their experience, you would have to parse the input yourself. Input the whole line with the good old std::getline(std::cin, line);.

Initialise int read_from = std::string::npos;.
Now, find the last space with read_from = line.rfind(' ', read_from);. A read_from == std::string::npos will tell you that all inputs have been parsed, or there is an error.

A line.substr(read_from) fetches you the last input. Convert it to the appropriate type and store. You will also have to erase the parsed input with line.resize(read_from);

Rinse and repeat for the other inputs.

Note: It is suggested to store the undetermined data in a std::vector of the appropriate type.

4) March of Bytes

I know you would say that we have not addressed the OP's question,

... read in a file with personal information...

Now that we have discussed taking the input from the user, we can also choose how to store it (and fetch it).

The easiest way is to:

personal_data_file.write((char*)&person_list[i], sizeof(Person));  // Write it...
personal_data_file.read((char*)&person_list[i], sizeof(Person));   // ...Now read it.

in a loop, where person_list is a std::vector of Persons.

Note: Remember to open the file in std::ios::binary mode!

Elegant!


But just in case you are not familiar with classes and some features used in the examples above. Here are some links:

std::getline https://www.geeksforgeeks.org/how-to-use-getline-in-c-when-there-are-black-lines-in-input/

std::istream::read http://www.cplusplus.com/reference/istream/istream/read/

std::ostream::write http://www.cplusplus.com/reference/ostream/ostream/write/

std::vector https://www.geeksforgeeks.org/vector-in-cpp-stl/

std::istream_iterator<T> http://www.cplusplus.com/reference/iterator/istream_iterator/

Jaideep Shekhar
  • 808
  • 2
  • 7
  • 21
  • 1
    Holy.. Thank you for your detailed answer. I'm still relatively new to cpp so it'll take me a while to completely understand & test this. And even with my restriction on not changing the file, I think the first part is very informative for someone else. – picklepick Jan 10 '20 at 16:06
  • 1
    You can’t just blit a `std::string`! – Davis Herring Jan 10 '20 at 17:10
  • Also, `std::move` does nothing good and sometimes does harm when applied to a prvalue like that. – Davis Herring Jan 12 '20 at 03:20
1

If you have a variable length line (in terms of word count) you can simply read the entire line and either process it from the right, or cache all words and work with offsets. The example below does the latter one.

int to_int(std::string_view str)
{
    int val = 0;
    std::from_chars(str.data(), str.data() + str.size(), val);
    return val;
}

std::istream& operator>>(std::istream& in, Person& person)
{
    std::string line;

    // read whole line
    if (std::getline(in, line))
    {
        // split line into words
        std::vector<std::string> words;
        std::stringstream tmp_stream(line);
        for (std::string word; tmp_stream >> word; )
            words.push_back(word);

        // join first names
        tmp_stream.str(words[0]);
        for (std::size_t i = 1; i < words.size() - 3; i++)
            tmp_stream << ' ' << words[i];

        person.firstName_s = tmp_stream.str();
        person.lastName = words[words.size() - 3];
        person.ID = to_int(words[words.size() - 2]);
        person.SSN = to_int(words[words.size() - 1]);
    }

    return in;
}

I think the code is self-explanatory. Here is a full example.

Timo
  • 9,269
  • 2
  • 28
  • 58
  • Why use `std::string_view` in the first part for a small boost, if you gratuitiously use `std::stringstream` afterwards more than negating that? – Deduplicator Jan 10 '20 at 15:20
  • This looks interesting, maybe some other more experienced developers could take a look at this? I'm gonna try it in the meantime:) – picklepick Jan 10 '20 at 15:48
  • @rezi take a look at the edited version. Perhaps this is easier to read. – Timo Jan 10 '20 at 15:49
  • @Deduplicator true. Couldn't come up with a nice way to join strings... – Timo Jan 10 '20 at 15:50