2

I have a operation that continuously generates random solutions (std::vector<float>). I evaluate the solutions against a mathematical function to see their usefulness (float). I would like to store the top 10 solutions all the time. What would be the most efficient way to do this in C++?

I need to store both the solutions(std::vector) and their usefulness (float). I am performing several hundred thousands of evaluations and hence I am in need of an efficient solution.

Edit: I am aware of sorting methods. I am looking for methods other than sorting and storing the values. Looking for better data structures if any.

  • Without knowing more, I'd start with `std::array>,10>` – user4581301 Aug 23 '20 at 22:56
  • 2
    This is a very basic `O(n)` algorithm that I expect to find, as a given example, in every computer science and theory of computation textbook. There's nothing complicated here: store 10 best solution (initialized by the first ten values), then starting with the 11th one, check if it's better than the worst one so far, and if so replace it with the new value. Fairly straightforward. – Sam Varshavchik Aug 23 '20 at 22:56
  • 1
    Scanning an array of ten elements for the lowest and replacing it is going to be cheaper than just about anything else you can do. Start stupid and complicate when you have to. It's stunning how often stupid is fast enough. – user4581301 Aug 23 '20 at 23:02
  • @user4581301 so this is the best possible way. I was about to do the same, but was curious if there is any other ways to do this. I wanted to make sure as I can benefit from even a small increase in performance – Anthony M Benedict Aug 23 '20 at 23:05
  • 1
    I can't guarantee it's the best possible way, but the spin-up time for anything that would perform the task smarter would probably take longer than looking for the worst value of 10 `float`s. There is almost always some inertia that needs to be overcome when you're being smart, and computers are really really good at doing stupid things very quickly. – user4581301 Aug 23 '20 at 23:11

2 Answers2

1
  1. You evaluate the float score() function for current std::vector<T> solution, store them in a std::pair<vector<T>, float>.
  2. You use a std::priority_queue< pair<vector<T>, float> > to store the 10 best solutions based on their score, and the score itself. std::priority_queue is a heap, so it allows you to extract its max value according to a compare function that you can set up to compare score_a < score_b.
  3. Store the first 10 pairs, then for each new one compare it with the top of the heap, if score(new) > score(10th) then insert(new) into the priority_queue p, and p.pop_back() to get rid of the old 10th element.
  4. You keep doing this inside a loop until you run out of vector<T> solutions.
Giogre
  • 1,444
  • 7
  • 19
  • 1
    Downside of a `priority_queue`: you can't interact with any element other than the output end. This complicates things if you want to look at the other items without dismantling the `priority_queue`. Plus, depending on what underlying container you use, the shuffling caused by insertion of a new element will be more expensive than simple array traversal to find and replace the worst element. – user4581301 Aug 23 '20 at 23:16
  • 1
    @Lingo This was what I was looking for. I remember using priority_queues before. This might work for me or just a simple traversal might work. I will check it out. – Anthony M Benedict Aug 23 '20 at 23:21
  • @user4581301: well you can always use an `array`, and then use the command `make_heap` to order it internally like a heap, and elements will still be accessible, because the array will still be left as an array. This is an alternative way to obtain a heap-like structure available in C++, other than the more literal heap in `priority_queue`. Probably there will be more internal laboring involved with both `priority_queue` and `array`+`make_heap` than with a simple `array[10]`, I will concede, but the code obtained with a `priority_queue` will be clearer, more streamlined and self-explanatory. – Giogre Aug 23 '20 at 23:30
  • @AnthonyMBenedict: glad to be of help. Whenever there is the need of repeatedly extracting min or max elements from a data structure, always go with heaps, I was told. – Giogre Aug 23 '20 at 23:35
  • @AnthonyMBenedict Best thing to do is try it out, profile it and A) see if it matters and if so, B) which is better. If queue behaviour is all you need, `priority_queue` is simple and easy to use. If the queue is better or as good, use it. – user4581301 Aug 24 '20 at 03:58
0

Have a vector of pair, where pair has 1 element as solution and other element as usefulness. Then write custom comparator to compare elements in the vector.

Add element at last, then sort this vector and remove last element.

As @user4581301 mentioned in comments, for 10 elements, you dont need to sort. Just traverse vector everytime, or you can also perform ordered insert in vector.

Here are some links to help you:

https://www.geeksforgeeks.org/sorting-vector-of-pairs-in-c-set-1-sort-by-first-and-second/

Comparator for vector<pair<int,int>>

another_CS_guy
  • 682
  • 6
  • 7
  • 1
    No need to sort or ordered insert. Probably cost more than traversal and replace in a container of 10 elements. Larger containers, use a sort, but a worst case traversal of ten elements is probably lower cost than the start-up costs of sorting. – user4581301 Aug 23 '20 at 22:59
  • Side note: That geek for geeks link [lived up to my expectations](https://godbolt.org/z/WKfq14). Prefer to direct askers to sites with code that compiles. – user4581301 Aug 23 '20 at 23:06