Constraining remove_if on only part of a C++ list

Question

I have a C++11 list of complex elements that are defined by a structure node_info. A node_info element, in particular, contains a field time and is inserted into the list in an ordered fashion according to its time field value. That is, the list contains various node_info elements that are time ordered. I want to remove from this list all the nodes that verify some specific condition specified by coincidence_detect, which I am currently implementing as a predicate for a remove_if operation.

Since my list can be very large (order of 100k -- 10M elements), and for the way I am building my list this coincidence_detect condition is only verified by few (thousands) elements closer to the "lower" end of the list -- that is the one that contains elements whose time value is less than some t_xv, I thought that to improve speed of my code I don't need to run remove_if through the whole list, but just restrict it to all those elements in the list whose time < t_xv.

remove_if() though does not seem however to allow the user to control up to which point I can iterate through the list.

My current code. The list elements:

struct node_info {
char   *type = "x";
int    ID    = -1;
double time  = 0.0;
bool   spk   = true;
};

The predicate/condition for remove_if:

// Remove all events occurring at t_event
class coincident_events {
double t_event; // Event time
bool   spk;     // Spike condition
public:
    coincident_events(double time,bool spk_) : t_event(time), spk(spk_){}
    bool operator()(node_info node_event){
        return ((node_event.time==t_event)&&(node_event.spk==spk)&&(strcmp(node_event.type,"x")!=0));
    }
};

The actual removing from the list:

void remove_from_list(double t_event, bool spk_){
// Remove all events occurring at t_event
coincident_events coincidence(t_event,spk_);
event_heap.remove_if(coincidence);
}

Pseudo main:

int main(){
    // My list
    std::list<node_info> event_heap;

    ...
    // Populate list with elements with random time values, yet ordered in ascending order
    ...

    remove_from_list(0.5, true);

    return 1;
}

It seems that remove_if may not be ideal in this context. Should I consider instead instantiating an iterator and run an explicit for cycle as suggested for example in this post?

score 2 · Answer 1 · answered Aug 12 '18 at 12:32

It seems that remove_if may not be ideal in this context. Should I consider instead instantiating an iterator and run an explicit for loop?

Yes and yes. Don't fight to use code that is preventing you from reaching your goals. Keep it simple. Loops are nothing to be ashamed of in C++.

score 1 · Answer 2 · answered Aug 12 '18 at 12:33

First thing, comparing double exactly is not a good idea as you are subject to floating point errors.

You could always search the point up to where you want to do a search using lower_bound (I assume you list is properly sorted).

The you could use free function algorithm std::remove_if followed by std::erase to remove items between the iterator returned by remove_if and the one returned by lower_bound.

However, doing that you would do multiple passes in the data and you would move nodes so it would affect performance.

See also: https://en.cppreference.com/w/cpp/algorithm/remove

So in the end, it is probably preferable to do you own loop on the whole container and for each each check if it need to be removed. If not, then check if you should break out of the loop.

for (auto it = event_heap.begin(); it != event_heap.end(); )
{
    if (coincidence(*it))
    {
        auto itErase = it;
        ++it;
        event_heap.erase(itErase)
    }
    else if (it->time < t_xv)
    {
        ++it;
    }
    else
    {
        break;
    }
}

As you can see, code can easily become quite long for something that should be simple. Thus, if you need to do that kind of algorithm often, consider writing you own generic algorithm.

Also, in practice you might not need to do a complete search for the end using the first solution if you process you data in increasing time order.

Finally, you might consider using an std::set instead. It could lead to simpler and more optimized code.

score 0 · Accepted Answer · answered Aug 12 '18 at 18:05

Thanks. I used your comments and came up with this solution, which seemingly increases speed by a factor of 5-to-10.

void remove_from_list(double t_event,bool spk_){
    coincident_events coincidence(t_event,spk_);
    for(auto it=event_heap.begin();it!=event_heap.end();){
        if(t_event>=it->time){
            if(coincidence(*it)) {
                it = event_heap.erase(it);
            }
            else
                ++it;
        }
        else
            break;
        }
}

The idea to make erase return it (as already ++it) was suggested by this other post. Note that in this implementation I am actually erasing all list elements up to t_event value (meaning, I pass whatever I want for t_xv).

Constraining remove_if on only part of a C++ list

3 Answers3