-1

I'm thinking about some different ways of erasing several pointers of a std::vector of pointers. I know that de erase/remove_if idiom is a good suit, but I'm thinking about a situation in which I have a container of pointers to remove from the std::vector that I have, something like this:

std::vector<Object*> elementsToRemove;
 ... // fill elementsToRemove
 
 RemoveObjects(elementsToRemove, myObjectsVector)

I'm asking about this because in my project we are facing some performance issues mostly because of item-by-item removal of vector containers, which imply in several reallocations.

One way that I'm considering to implement this is with the erase/remove_if idiom, in which the predicate of the remove_if is a function that checks if the Object* is in the elementsToRemove Container.

Anyone have a better sugestion of how to approach this problem?

I'm considering to implement this is with the erase/remove_if idiom, in which the predicate of the remove_if is a function that checks if the Object* is in the elementsToRemove Container. However, this approach will require n x r comparisons (n is the number of Objects, and r is the number of objects to remove).

genpfault
  • 51,148
  • 11
  • 85
  • 139
  • can you sort the vectors? (Note that comparing pointers via `<` is not OK but by via `std::less` it is ok) – 463035818_is_not_an_ai Oct 31 '22 at 17:20
  • 2
    you must have one hell of a lot of vector entries for this to be a performance problem. perhaps you could describe your domain in a bit more detail? – Neil Butterworth Oct 31 '22 at 17:36
  • Do the objects in question support ordered comparison (i.e., is `a < b` is meaningful for them)? If so, are they (or can they be) stored in sorted order in both the `elemntsToRemove` and `myObjectsVector`? – Jerry Coffin Oct 31 '22 at 18:15
  • Is the performance impact from the item-by-item _lookup_, the item-by-item _removal_, or the reallocation? – Useless Oct 31 '22 at 18:21
  • 2
    Just for what it's worth, reallocation only happens when adding elements to a vector, not when removing elements from the vector. – Jerry Coffin Oct 31 '22 at 18:30
  • Well, unless perhaps you explicitly shrink the capacity, but none of this is shown. – Useless Oct 31 '22 at 18:35
  • @Useless: Yes, calling `shrink_to_fit` can cause reallocation (but that's separate from actually removing elements). – Jerry Coffin Oct 31 '22 at 18:38
  • @JerryCoffin , I did a profiling in the application and I saw that the reallocation in fact happens in item erasing. The item addition can cause reallocation too, but the stl usually do this reallocation considering an additional capacity, that permits various additions without reallocation. On the other hand, the erase operation always require reallocation, think in the operation of removing an item in the middle of a vector, and having to keep it contiguous in memory, this will require a reallocation. The remove_if permits us to do just one erase operation after filtering the items to remove – K. Kapelinski Oct 31 '22 at 19:41
  • @Useless it's the reallocation. – K. Kapelinski Oct 31 '22 at 19:42
  • 1
    OK, so show your erase/remove_if code that results in reallocation, and let us know which compiler & standard library you use. This behaviour sounds odd, and your justification makes no sense (removing from the middle of a vector requires copying or moving, but not reallocation) – Useless Oct 31 '22 at 19:45
  • @JerryCoffin the sorting ideia is a good one. I saw that in this idea the stl has the std::set_difference, which is a good choice, but in my case the container from which I will remove the pointers is not ordered, and I can't sort this container in particular. Thanks for the tip anyway – K. Kapelinski Oct 31 '22 at 19:47
  • @Useless, I said that about the reallocation because of what I've read in the docs: https://cplusplus.com/reference/vector/vector/erase/ : "Because vectors use an array as their underlying storage, erasing elements in positions other than the vector end causes the container to relocate all the elements after the segment erased to their new positions. This is generally an inefficient operation compared to the one performed for the same operation by other kinds of sequence containers". – K. Kapelinski Oct 31 '22 at 19:58
  • 1
    @K.Kapelinski: Just to be clear: removing elements from the middle of a vector does require moving the following elements to be moved to fill in the spot of the removed element. But that's not the same as realloction. – Jerry Coffin Oct 31 '22 at 20:50
  • @JerryCoffin , in fact I misunderstood, I did a simple sample program in which I use the erase of the std::vector. At the end of the operation the size of the container decreases, but the capacity don't. You were right. So, in my case I know that the program is not using the shrink_to_fit, so there's no reallocation, in this case, the performance issue must be because of those moving operations, right? – K. Kapelinski Oct 31 '22 at 21:24
  • 1
    You should try using a profiler that _shows_ where the time is spent instead of guessing. – Useless Oct 31 '22 at 21:30
  • @K.Kapelinski: yes, probably. And yes, that's part of why the remove/erase idiom helps. Instead of doing those moves repeatedly, it coalesces things, so each element is only moved once, to the position it occupies after the operation is finsihed. – Jerry Coffin Oct 31 '22 at 22:12
  • @JerryCoffin got it, thank you for your tips and explanations, it was very helpful to me. – K. Kapelinski Oct 31 '22 at 22:23
  • An idea by the way (I did not test or benchmark it) and I think you will not be allowed to change the vector for a deque, but maybe a deque with not to big buckets could alleviate the burden by moving just the element inside one bucket (see [What really is a deque in STL?](https://stackoverflow.com/a/6292437/3972710) ) – NGI Nov 01 '22 at 16:24

1 Answers1

0

If you can sort both vectors then you need only a single pass through both for remove_if (erase is linear as well):

#include <vector>
#include <algorithm>
#include <iostream>

int main()
{
    std::vector<int>  X{1,2,3,4,5,6,7,8,9};
    std::vector<int>  remove{1,3,6,7};

    auto first = remove.begin();

    auto it = std::remove_if(X.begin(),X.end(),[&](int x) {
        while (first != remove.end() && *first < x) ++first;        
        return (first != remove.end() && *first == x);     
    });

    X.erase(it,X.end());
    for (const auto& x : X ) std::cout << x << " ";
}

Letting remove_if continue to scan the input vector X even though all elements from remove are already consumed is a little silly. Take the code with a grain of salt. It is just to demonstrate that also with the erase remove idiom it can be done with less than n x r comparisons.

Sorting is O(N logN) so you'd get that in total.

Alternatively you can sort only RemoveObjects then looking up an element in it can be done via binary search.

Note that comparing the pointers via < is unspecified. Comparing them via std::less on the other does yield a (implementation defined) strict total order. std::sort uses std::less by default, you should need to consider that the resulting ordering is not necessarily consisten with <.

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
  • Thanks, this helped me a lot. In fact, the container from which I will remove the items can't be sorted, but as you said, I can sort the vector of removed objects, or use a std::set to make de searching operation better. – K. Kapelinski Oct 31 '22 at 20:00
  • @K. Kapelinski Just out of curiosity how can't a container be sorted. Elements may be identified by an id, either integer like ( even big ) or string like so that they can be compared. Are there cases that are completely outside this pattern ? Or just maybe you can't sort the vector as it would take too much time ? – NGI Oct 31 '22 at 22:27
  • 1
    @NGI in this case I'm not considering sorting the container from which I will remove the objects because this container stores objects that have dependencies between each other, so altering the order of objects in this container can cause impact in the application. Sorry, I expressed myself incorrectly, it's possible to sort the container, but I'm no considering this possibility for now, to start with a solution with minimal impact. – K. Kapelinski Nov 01 '22 at 12:46