2

I have a class(object), User. This user has 2 private attributes, "name" and "popularity". I store the objects into a vector (container).

From the container, I need to find the top 5 most popular user, how do I do that? (I have an ugly code, I will post here, if you have a better approach, please let me know. Feel free to use other container, if you think vector is not a good choice, but please use only: map or multimap, list, vector or array, because I only know how to use these.) My current code is:

int top5 = 0, top4 = 0, top3 = 0, top2 = 0, top1 = 0;
vector<User>::iterator it;

for (it = user.begin(); it != user.end(); ++it) 
{
    if( it->getPopularity() > top5){
        if(it->getPopularity() > top4){
            if(it->getPopularity() > top3){
                if(it->getPopularity() > top2){
                    if(it->getPopularity() > top1){
                        top1 = it->getPopularity();
                        continue;
                    } else {
                        top2 = it->getPopularity();
                        continue;
                    }
                } else {
                    top3 = it->getPopularity();
                    continue;
                }
            }
        } else {
            top4 = it->getPopularity();
            continue;
        }
    } else {
        top5 = it->getPopularity();
        continue;
    }
}

I know the codes is ugly and might be prone to error, thus if you have better codes, please do share with us (us == cpp newbie). Thanks

Frerich Raabe
  • 90,689
  • 19
  • 115
  • 207
celta
  • 33
  • 3
  • If you don't know some container that would be good for this task, I think that's a good opportunity to learn it, not ignore it. – svick Aug 16 '11 at 07:29
  • 1
    How long is your list and how fast does it have to be? Is sorting the whole list fast enough for you? – svick Aug 16 '11 at 07:30
  • @svick: What would you recommend? I dont know how many objects it will be, but should be around 10000-40000. I posted about the container question few days ago, you can view it here: http://stackoverflow.com/questions/7021529/dynamic-size-of-array-in-c. James Kanze was said use vector until it does not work, thus i have chosen vector. – celta Aug 16 '11 at 07:30
  • Is this homework? If not (or if it is homework and you're permitted to use it), you should look at the routines available in the `` header. – Michael Burr Aug 16 '11 at 07:37
  • @ Michael: Yes it is a homework. But I does all the codes myself. Only this part of codes, I feel is wrong, or I should say, not very neat and tidy, and thus I want to explore a more elegant way. – celta Aug 16 '11 at 07:41

8 Answers8

9

You can use the std::partial_sort algorithm to sort your vector so that the first five elements are sorted and the rest remains unsorted. Something like this (untested code):

bool compareByPopularity( User a, User b ) {
    return a.GetPopularity() > b.GetPopularity();
}

vector<Users> getMostPopularUsers( const vector<User> &users, int num ) {
    if ( users.size() <= num ) {
        sort( users.begin(), users.end(), compareByPopularity );
    } else {
        partial_sort( users.begin(), users.begin() + num, users.end(), 
                      compareByPopularity );
    }
    return vector<Users>( users.begin(), users.begin() + num );
}
Frerich Raabe
  • 90,689
  • 19
  • 115
  • 207
  • +1 for using `std::partial_sort`. However, this can be further improved by considering whether `num` is greater than `users.size()/2` or not. I mean, just imagine, if `num = users.size()-1`, then `partial_sort` sorts `users.size()-1` items, though the result will be same if it sorts just one `1` item, but using different compare function. – Nawaz Aug 16 '11 at 07:46
  • @ Frerich: Hi, you were saying "first five elements are sorted and the rest remains unsorted"? I could not get the idea. Is it something like: compare all the objects values, sort top 5, then the first 5 objects might change, but the rest of the objects remain? – celta Aug 16 '11 at 07:48
  • @ Frerich: As compare to Itsik solution, which use sort to sort the whole vector, which one is better? In term of performances? – celta Aug 16 '11 at 07:55
  • @Frerich: +1 as soon as you do a full sort on the special case `users.size() <= num`. (There is no guarantee that `users[0]` is the #1 choice.) – David Hammen Aug 16 '11 at 08:04
  • 2
    @celta: For vector sizes like yours (10000-40000), any performance difference is probably insignificant (disclaimer: I didn't profile it). However, I believe that `partial_sort` is superior here on moral grounds: it conveys the intention of the code more clearly (at least to me, your mileage may vary). – Frerich Raabe Aug 16 '11 at 08:06
  • @David Hammen: Well, this function only intends to yield the five most popular users - but in no particular order. It's just a (nice?) effect that `std::partial_sort` happens to yield them in sorted order. If you want to send a gold badge to the top 5 StackOverflow users, you don't need to know which of them came out 1st. The idea was that if the caller needs to have the top five sorted, too then he can do that himself. – Frerich Raabe Aug 16 '11 at 08:10
  • @celta: The implementation of `std::partial_sort` in g++ is a very fast, heap-based approach. It is not insertion sort. It takes approximately N*log(num) comparisons on average; compare that to the N*log(N) compares needed for a full sort. – David Hammen Aug 16 '11 at 08:12
  • @Frerich: It's pretty clear that celta was trying to create the top five, sorted from #1 to #5. Perhaps celta could chime in regarding whether those top five need to be sorted, or if just finding the top five is good enough. – David Hammen Aug 16 '11 at 08:15
  • @David Hammen: Right. Just for the record, of course you're correct that if the top five entries should be in sorted order, then the special-case needs a `std::sort` call. – Frerich Raabe Aug 16 '11 at 08:17
  • @ David& Frerich: Yeah, like David said I need to display whos the most famous. Sorry for not making my question clear. It will be like: #1. User A, #2. User B.. Something like that. In order words, I need to combine Frerich and Itsik answer to get my answer? How should I pick the best answer now? – celta Aug 16 '11 at 08:22
  • @David Hammen, celta: I just updated the function sketch to take the clarified requirements into account. – Frerich Raabe Aug 16 '11 at 08:35
  • SO, +1 then. @celta: This is homework, so you need to think hard about whether this is the solution you want to turn in. Have you been taught anything about the C++ standard library? Be honest with yourself. Your instructor might well look askance at this solution and ask if you received outside help. – David Hammen Aug 16 '11 at 08:49
  • @Frerich: Do note that your functions *modifies* the input (contrary to the `const&` in the signature). +1 for `partial_sort`. – Matthieu M. Aug 16 '11 at 09:01
  • @ David: Noted, good question. I cant learn everything from my instructor. He has 40++ students in a classroom, he cant take care of every student. Hopefully, he wont mind I get outside (SO) help. And this top5 thing is an add-on thing, not a compulsory task from my homework. So yeah, I hope he wont mind lol – celta Aug 16 '11 at 09:06
2

Why don't you sort (std::sort or your own implementation of Quick Sort) the vector based on popularity and take the first 5 values ?

Example:

bool UserCompare(User a, User b) { return a.getPopularity() > b.getPopularity(); }
...
std::sort(user.begin(), user.end(), UserCompare);
// Print first 5 users
Itsik
  • 3,920
  • 1
  • 25
  • 37
  • Sorry, I am new, need to clarify few things. I read about the std::sort, but didnt know can be applied to object. Based on your code, my vector will be sorted descending, thus I only need to loop 5 times from the vector to get the first 5 values, and that first 5 values are the top 5 most popular users? – celta Aug 16 '11 at 07:36
  • @celta You are correct, the vector will be descending and the first 5 values are the 5 most popular – Itsik Aug 16 '11 at 07:44
2

First off, cache that it->getPopularity() so you don't have to keep repeating it.

Secondly (and this is much more important): Your algorithm is flawed. When you find a new top1 you have to push the old top1 down to the #2 slot before you save the new top1, but before you do that you have to push the old top2 down to the #3 slot, etc. And that is just for a new top1. You are going to have to do something similar for a new top2, a new top3, etc. The only one you can paste in without worrying about pushing things down the list is when you get a new top5. The correct algorithm is hairy. That said, the correct algorithm is much easier to implement when your topN is an array rather than a bunch of separate values.

Thirdly (and this is even more important than the second point): You shouldn't care about performance, at least not initially. The easy way to do this is to sort the entire list and pluck off the first five off the top. If this suboptimal but simple algorithm doesn't affect your performance, done. Don't bother with the ugly but fast first N algorithm unless performance mandates that you toss the simple solution out the window.

Finally (and this is the most important point of all): That fast first N algorithm is only fast when the number of elements in the list is much, much larger than five. The default sort algorithm is pretty dang fast. It has to be wasting a lot of time sorting the dozens / hundreds of items you don't care about before a pushdown first N algorithm becomes advantageous. In other words, that pushdown insertion sort algorithm may well be a case of premature disoptimization.

David Hammen
  • 32,454
  • 9
  • 60
  • 108
  • @ David: Thanks for the advise. For the #1 tips, "cache", how do I do that? The tip #2 & #3 are extremely important for me in future when I encountered similar situation. Learned something again, thanks. – celta Aug 16 '11 at 07:43
  • "Cache": Put it into a local variable. Also, the final point is very important. Avoid premature optimization. One last point, and this is more important than any of the points I already raised: It never hurts to see if someone hasn't already solved the exact problem for you. In this case, the correct solution is a part of the C++ library in `std::partial_sort`. – David Hammen Aug 16 '11 at 08:00
  • Opps, sorry I missed out that one. I will sure note that down sir. – celta Aug 16 '11 at 08:25
2

If you just want top 5 popular uses, then use std::partial_sort().

    class User
    {
    private:
        string name_m;
        int popularity_m;
    public:
        User(const string& name, int popularity) : name_m(name), popularity_m(popularity) { }
        friend ostream& operator<<(ostream& os, const User& user)
        {
            return os << "name:" << user.name_m << "|popularity:" << user.popularity_m << "\n";
            return os;
        }

        int Popularity() const 
        {
            return popularity_m;
        }

    };

    bool Compare(const User& lhs, const User& rhs)
    {
        return lhs.Popularity() > rhs.Popularity();
    }

    int main()
    {
        // c++0x. ignore if you don't want it.
        auto compare = [](const User& lhs, const User& rhs) -> bool 
                  { return lhs.Popularity() > rhs.Popularity(); };

        partial_sort(users.begin(), users.begin() + 5, users.end(), Compare);

        copy(users.begin(), users.begin() + 5, ostream_iterator<User>(std::cout, "\n"));
    }
Jagannath
  • 3,995
  • 26
  • 30
0

Sort your objects, maybe with the library if this is allowed, and then simply selecte the first 5 element. If your container gets too big you could probably use a std::list for the job.

Edit : @itsik you beat me to the sec :)

FailedDev
  • 26,680
  • 9
  • 53
  • 73
0

Do this pseudo code.

Declare top5 as an array of int[5] // or use a min-heap
Initialize top5 as 5 -INF

For each element A
   if A < top5[4]                  // or A < root-of-top5
      Remove top5[4] from top5     // or pop min element from heap
      Insert A to top              // or insert A to the heap
J-16 SDiZ
  • 26,473
  • 4
  • 65
  • 84
0

Well, I advise you improve your code by using an array or list or vector to store the top five, like this

struct TopRecord
{
    int index;
    int pop;
} Top5[5];

for(int i = 0; i<5; i++)
{
    Top5[i].index = -1;
    // Set pop to a value low enough
    Top5[i].pop = -1;
}

for(int i = 0; i< users.size(); i++)
{
    int currentpop = i->getPopularity()
    int currentindex = i;
    int j = 0;
    int temp;

    while(j < 5 && Top5[j].pop < currentpop)
    {
        temp = Top5[j].pop;
        Top[j].pop = currentpop;
        currentpop = temp;

        temp = Top5[j].index;
        Top[j].index = currentindex;
        currentindex = temp;

        j++;
    }
}
0

You also may consider using Randomized Select if Your aim is performance, since originally Randomized Select is good enough for ordered statistics and runs in linear time, You just need to run it 5 times. Or to use partial_sort solution provided above, either way counts, depends on Your aim.

  • @ Constantine: Hey thanks. I will check out the Randomized Select. Learned new thing again, thanks :D – celta Aug 16 '11 at 08:42