0

My program is supposed to read numerical data (in pairs) from a text file, remove the duplicates and then tell me the size of the vector before moving on to other things. I tested the code below and it works with files that have about 10-20 lines of data, however when I try it on larger data sets (over 100,000 lines of numerical data), there is no output when I try to cout the size(); of the vector. Not sure what the problem is.

while( getline( fs1, instrng ) ) {

    istringstream s1(instrng);
    int a, b;
    s1 >> a >> b;
    pair<int,int> pair1 = make_pair(a,b);
    vec1.push_back( pair1 );

    sort( vec1.begin(), myvec1.end() );
    myvec.erase(unique(myvec.begin(),myvec.end()), myvec.end());
Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • So are you on a mission to solve [each](http://stackoverflow.com/q/25820365/1870232) and [every](http://stackoverflow.com/q/25820586/1870232) scenario of this problem using SO ? – P0W Sep 13 '14 at 17:18
  • Use `std::set` instead. – Jarod42 Sep 13 '14 at 17:21
  • I have to use a vector for this program @Jarod42 –  Sep 13 '14 at 17:23
  • So look at `std::lower_bound` and `std::vector::insert`. You won't have to re-sort the vector every time, just insert element to the right place (if not already present). – Jarod42 Sep 13 '14 at 17:25
  • Seeing your exact similar post looks like you suffer from [_X-Y_](http://meta.stackexchange.com/q/66377) problem. You need to take [advice](http://stackoverflow.com/questions/25820586/how-to-remove-specific-pairs-from-a-vector#comment40393538_25820586) from people, given in those comments, also programming includes _debugging_ skills something simply doesn't go wrong. – P0W Sep 13 '14 at 17:30

1 Answers1

1

In your context the use of std::sort() and std::unique() is working. The problem is a performance issue. You need to do one of two things:

  1. You wait long enough to give the implementation a chance to do a lot of sorting.
  2. You move the use of std::sort(), std::unique(), and std::vector<...>::erase() out of the loop.

Put differently: you managed to create a O(n * n * log n) implementation for something which should be O(n * log n) where n is the number of elements.

BTW, you should always verify that input was successful: you failed to check that reading the two integers on the line was successful. Personally, I would just read the integers from the original stream and not use std::istringstream as creation and/or set up of these streams is also adding unnecessary costs. This cost is, admittedly, a lot less than sorting for each element.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • Just for the records because it looks like abeginners question: He could also use std::lower_bound to find the position to insert the element and check if the element is already in the vector inside the loop. This may save memory, although I doubt that this will be an issue with pairs of integers since you can have ~130,000 pairs in 1MB of memory. And inserting at random positions can destroy performance because the loop is then O(n^2). – Jens Sep 13 '14 at 18:30