C++ Grouping repetitions within Vector

Question

I've got a file structured like this:

Each piece of information is being stored like so:

temp.code >> temp.personid >> temp.data

I've stored this information in a Vector

 ifstream fin("test.txt");
 vector<TestClass> test;
 TestClass temp;
 string line;
 while (getline(fin, line)) {//.. test.push_back(temp);}

A given personid can come up numerous times within the file. What I want to do is iterate through the vector and group the repetitions into a single class object per personid, my goal is that I want sum the data for each particular object such that the output for the file above would be:

123456 : 7
235334 : 0

What would be an elegant way to approach this?

Thanks

How about [a map](http://en.cppreference.com/w/cpp/container/unordered_map) with the `personid` as the key and the *accumulated* `data` as data? — Some programmer dude, Sep 12 '17 at 07:30
You can use a map, with the key being the type of `personid` and the value being the type of `data`. — Bernard, Sep 12 '17 at 07:31

Andre Kampling · Accepted Answer · 2017-09-12T08:24:53.613

The following code uses an std::unordered_map as suggested by the comments already. It will read your file line by line.
The code assumes that the person's id is of type int, the code of type std::string and the data of type int.

It inserts every Person (here as example a struct) into the map. If a person's id is already there it will sum up the data. That means that this solution doesn't use a temporary std::vector but only the std::unordered_map.

See live example with your data on ideone.com.

Code:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <unordered_map>

struct Person
{
   std::string code;
   int         data;
};

typedef std::unordered_map<int, Person> PersonMap;

int main()
{
   std::ifstream fin("test.txt");

   PersonMap persons;

   /* line by line reading */
   for (std::string line; std::getline(fin, line); )
   {
      std::istringstream iss(line);

      int    personId;
      Person personData;

      /* parse line as std::string, int, int */
      iss >> personData.code >> personId >> personData.data;

      /* insert into map and save result */
      std::pair<PersonMap::iterator, bool> insertResult =
         persons.insert(std::pair<int, Person>(personId, personData));

      /* if personId is already there */
      if (!insertResult.second)
      {
         insertResult.first->second.data += personData.data;
      }
   }

   /* output whole map */
   for(auto const &person : persons)
   {
      std::cout << person.first << " : " << person.second.data << "\n";
   }
   std::cout << std::flush;
}

Output:

235334 : 0
123456 : 7

Ishpreet · Answer 2 · 2017-09-12T09:02:21.750

Use Unordered map. The lookup time in the unordered map is constant O(1) in average case. I have used vector for sample data, you can load data from a file instead of the vector.

#include <bits/stdc++.h>
using namespace std;

int main() {
    unordered_map<string, int>m;
    unordered_map<string, int>::iterator itr;    // Iterator to iterate unordered map
    vector<pair<string, int> >person_details;    // pair of vector to represent sample data, you can load data from file instead
    person_details.push_back(make_pair("123456",0));
    person_details.push_back(make_pair("123456",5));
    person_details.push_back(make_pair("235334",0));
    person_details.push_back(make_pair("123456",2));
    for(int i=0;i<person_details.size();i++)
    {
        if(m.find(person_details[i].first) == m.end() )                // If personId is not present in map, insert it
            m[person_details[i].first]=person_details[i].second;
        else m[person_details[i].first]+=person_details[i].second;        // If personId is present in map, increment it.
    }
    for(itr=m.begin();itr!=m.end();itr++)          
        cout<<itr->first<<" "<<itr->second<<endl;       // Displaying personId with occurance
    return 0;
}

Output:
235334 0
123456 7

Note: You can use Map to have constant O(LogN) time, where N is the size of container.

`The complexity of lookup for std::unordered_map is O(1) (constant) in the average case, and O(N) (linear) in the worst case.` taken from here: https://stackoverflow.com/questions/16068151/c-stl-map-is-access-time-o1 — Andre Kampling, Sep 12 '17 at 08:29
Keep in mind that O(1) does *not* equal faster execution. Sometimes iterating over a small vector is faster even though it is a linear search — Eyal K., Sep 12 '17 at 08:35
@AndreKampling Thanks, I have updated the answer. `Worst case only occurs in case of too many hash collisions`. — Ishpreet, Sep 12 '17 at 09:07

C++ Grouping repetitions within Vector

2 Answers2