10

Why do we have 2 ways like above to search for an element in the set?

Also find algorithm can be used to find an element in a list or a vector but what would be the harm in these providing a member function as well as member functions are expected to be faster than a generic algorithm?

Why do we need remove algorithm and create all the drama about erase remove where remove will just shift the elements and then use erase to delete the actual element..Just like STL list provides a member function remove why cant the other containers just offer a remove function and be done with it?

Ali
  • 56,466
  • 29
  • 168
  • 265
Abhishek
  • 151
  • 1
  • 1
  • 9
  • +1 great question. I think the answer may be just "they messed up"... – user541686 Jan 04 '14 at 02:10
  • @Mehrdad I don't think "they messed up." I believe I could answer these questions, please check my answer. – Ali Jan 05 '14 at 13:33
  • @Ali: I just read your answer, and I don't think it actually "answers" anything. You're just summarizing the status quo, not providing the reasons behind it. The only thing I got out of your answer was "I understand your problem". I really *do* think they messed up, because `set.lower_bound` shouldn't be there -- `std::lower_bound` should be specialized to do the same job. – user541686 Jan 05 '14 at 20:15
  • @Mehrdad *"std::lower_bound should be specialized to do the same job."* That occured to me too. And apparently it cannot be done, see [Is there any technical reason why std::lower_bound is not specialized for red-black tree iterators?](http://stackoverflow.com/q/20934717/341970) As for *"I just read your answer, and I don't think it actually "answers" anything"*, I am sorry to hear that. In my opinion, I gave solid reasons why we need those functions. – Ali Jan 05 '14 at 21:14
  • @Ali: I don't buy the answers on that page. It says *"Although it is likely that there are parent pointers, requiring so for the tree seems inappropriate."*. I don't think that's true. How can you possibly avoid requiring parent pointers? The time complexity of `set::insert(iterator, value)` is amortized constant time if the iterator points to the correct location. But without parent pointers, I think in order to ensure the tree is balanced after the insert, the tree *must* be traversed starting from the root every single time, which is *not* amortized constant time... – user541686 Jan 05 '14 at 21:38
  • @Mehrdad Well, instead of writing this in a comment, please consider posting an answer to that question. From his comments, it seems to me Yakk thinks that too. – Ali Jan 05 '14 at 21:57
  • @Mehrdad Thanks. If this issue is not resolved in 2 days, I will set a bounty on that question; I would like to get a definitive answer. – Ali Jan 05 '14 at 22:02

2 Answers2

12

Binary_search in STL set over set's member function find?
Why do we have 2 ways like above to search for an element in the set?

Binary search returns a bool and set::find() and iterator. In order to compare apples to apples, the algorithm to compare set::find() with is std::lower_bound() which also returns an iterator.

You can apply std::lower_bound() on an arbitrary sorted range specified by a pair of (forward / bidirectional / random access) iterators and not only on a std::set. So having std::lower_bound() is justified. As std::set happens to be a sorted range, you can call

std::lower_bound(mySet.begin(), mySet.end(), value);

but the

mySet.find(value);

call is not only more concise, it is also more efficient. If you look into the implementation of std::lower_bound() you will find something like std::advance(__middle, __half); which has different complexity depending on the iterator (whether forward / bidirectional / random access iterator). In case of std::set, the iterators are bidirectional and advancing them has linear complexity, ouch! In contrast, std::set::find() is guaranteed to perform the search in logarithmic time complexity. The underlying implementation (which is a red and black tree in case of libstdc++) makes it possible. Offering a set::find() is also justified as it is more efficient than calling std::lower_bound() on std::set.

Also find algorithm can be used to find an element in a list or a vector but what would be the harm in these providing a member function as well as member functions are expected to be faster than a generic algorithm?

I don't see how you could provide a faster member function for list or vector, unless the container is sorted (or possesses some special property).

Why do we need remove algorithm and create all the drama about erase remove where remove will just shift the elements and then use erase to delete the actual element..Just like STL list provides a member function remove why cant the other containers just offer a remove function and be done with it?

I can think of two reasons.

Yes, the STL is seriously lacking many convenience functions. I often feel like I live in a begin-end hell when using algorithms on an entire container; I often proved my own wrappers that accept a container, something like:

template <typename T>
bool contains(const std::vector<T>& v, const T& elem) {

    return std::find(v.begin(), v.end(), elem) != v.end();
}

so that I can write

if (contains(myVector, 42)) { 

instead of

if (std::find(myVector.begin(), myVector.end(), 42) != myVector.end()) { 

Unfortunately, you quite often have to roll your own or use boost. Why? Because standardization is painful and slow so the standardization committee focuses on more important things. The people on the committee often donate their free time and are not paid for their work.

Now deleting elements from a vector can be tricky: Do you care about the order of your elements? Are your elements PODs? What are your exception safety requirements?

Let's assume you don't care about the order of your elements and you want to delete the i-th element:

std::swap(myVector[i], myVector.back());
myVector.pop_back();

or even simpler:

myVector[i] = myVector.back(); // but if operator= throws during copying you might be in trouble
myVector.pop_back();

In C++11 with move semantics:

myVector[i] = std::move(myVector.back());
myVector.pop_back();

Note that these are O(1) operations instead of O(N). These are examples of the efficiency and exception safety considerations that the standard committee leaves up to you. Providing a member function and "one size fits all" is not the C++ way.

Having said all these, I repeat I wish we had more convenience functions; I understand your problem.

Ali
  • 56,466
  • 29
  • 168
  • 265
  • Offering a set::find() is also justified as it yields more concise code The thing is yes it is more convenient but then still O(logn) complexity will be there for both the set find as well as binary search which will use lower bound..so i guess performance wise is same but yes set find is more convenient..Alos just curious can you give me an example of how set having a member function find can optimize the search as compared to the recursive call of lower bound by binary_search ..because set stores in a red black tree..would you use set.find over binary_search given an option? – Abhishek Jan 05 '14 at 01:57
  • @user3157360 As for your first comment: Welcome! By the way, here at Stackoverflow, we don't say thanks but upvote and accept answers. :) As for your second comment: I have fixed and updated the answer. Please read. – Ali Jan 05 '14 at 13:32
  • 1
    I think reduction of the interface (e.g. to increase maintainability) has also been a goal of the Standard Library - with the [infamous exception of `std::string`](http://www.gotw.ca/gotw/084.htm). For many types, the only member functions are functions that *need* to be member functions and cannot be written as non-member non-friend functions. See http://isocpp.org/std/library-design-guidelines ("class signatures want to be near minimal") – dyp Jan 05 '14 at 13:56
  • @DyP Thanks. However this question triggered a question that I am about to post. Stay tuned :) – Ali Jan 05 '14 at 13:59
  • Looking forward to it ;) -- Maybe it's a good thing the erase-remove idiom is hideous, as it's also quite inefficient O(n) especially for single-element erases, and should be replaced by several-removes-one-erase. – dyp Jan 05 '14 at 14:03
  • @DyP OK, please read: [Is there any technical reason why std::lower_bound is not specialized for red-black tree iterators?](http://stackoverflow.com/q/20934717/341970) I am interested in your input! – Ali Jan 05 '14 at 14:28
  • @Ali i agree with you that yes since the advancement of bidirectional iterators is linear in binary_search so that might be slower than set::find which is guaranteed log n time but since set itself has bidirectional iterators so why set::find to move those iterators will be faster :( i may be wrong but how is it faster? – Abhishek Jan 07 '14 at 09:52
  • @user3157360 It cannot be done with bidirectional iterators only. `std::set::find()` doesn't work on those iterators; it starts at the root node and performs the search differently, see [search in a binary search tree](http://commons.wikimedia.org/wiki/File:Binary_search_tree_search_4.svg). Now, the thing is that I think it could be done with the set / map iterators as well. Your question triggered a very interesting question, see [Is there any technical reason why std::lower_bound is not specialized for red-black tree iterators?](http://stackoverflow.com/q/20934717/341970) – Ali Jan 07 '14 at 10:53
  • @Ali i agree with you but then even for std::set to search and traverse the tree do you mean it does not use the iterators to move...or does it use internal pointers and hence that is faster than the iterators? am i right? and yes i agree about the lower bound not specialized for red black :( – Abhishek Jan 08 '14 at 10:37
  • @user3157360 It uses iterators, it just starts at the root node (not at begin() or end()). Please look at the picture I have sent you and you can also look at the implementation of set::find(). Then, it will be clear why it is different. It seems like it is possible to specialize `std::lower_bound()` to do the same but nobody has done yet. – Ali Jan 08 '14 at 10:54
  • @Ali I dont really understand what u are saying :( if you have time would be great if you could explain a bit more elaborately..i did see your picture and yes i am familiar with searching in a tree and yes binary search needs to use advance operation which is O(n) but then even in a set to traverse the tree u need to move from parent to child and so on..even that is O(n) so why should we compare that and say std:find is faster? – Abhishek Jan 11 '14 at 01:12
  • @user3157360 *"but then even in a set to traverse the tree u need to move from parent to child and so on..even that is O(n)"* Nope. Please look at the implementation of `std::set::find()`. Start up a debugger and step through the code. You will see why its only O(log n). I cannot explain it any better in comments. – Ali Jan 11 '14 at 01:16
  • @Ali .Below is the gcc implementation of red black tree which is used by set find algorithm and it uses lower bound like binary search! template typename _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>:: find(const _Key& __k) { iterator __j = _M_lower_bound(_M_begin(), _M_end(), __k); return (__j == end() || _M_impl._M_key_compare(__k, _S_key(__j._M_node))) ? end() : __j; } – Abhishek Jan 12 '14 at 03:13
  • @Ali so i am not sure how std find algorithm wins over binary_search algorithm as both use lower bound..There should be some other criteria isnt it? – Abhishek Jan 12 '14 at 03:14
  • @user3157360 You have stopped a step too early: Examine the implementation of `_Rb_tree::_M_lower_bound()` which is in stl_tree.h. If you are familiar with searching in trees, then it should be obvious why that implementation runs in `O(log n)` time. Hence my quesion: [Is there any technical reason why std::lower_bound is not specialized for red-black tree iterators?](http://stackoverflow.com/q/20934717/341970) – Ali Jan 12 '14 at 11:54
  • You are right, i saw the implementation.Let me list my understanding Binary search is an algorithm which runs in log time for random access iterators and linear time for bidirectional iterators coz of the advance function inside. So you mean set::find runs in O(log n) time but applying binary search algo on set runs in O(n) + O(log n)?Please confirm – Abhishek Jan 14 '14 at 08:22
  • @user3157360 Yes. Don't forget that `O(n)+O(log n)` is just `O(n)`. – Ali Jan 14 '14 at 10:44
  • @Ali Thanks got you :) btw do you know why java set offers no find function and there is only Collections binary search to do so? – Abhishek Jan 16 '14 at 09:11
  • @Abhishek It does offer a find function, although in a twisted way. Please see [my comment here](http://stackoverflow.com/questions/20915824/binary-search-in-stl-set-over-sets-member-function-find/20921137?noredirect=1#comment31414516_20916242). – Ali Jan 16 '14 at 10:58
1

I'll answer part of your question. The Erase-Remove idiom is from the book “Effective STL” written by Scott Meye. As to why remove() doesn't actually delete elements from the container, there is a good answer here, I just copy part of the answer:

The key is to realize that remove() is designed to work on not just a container but on any arbitrary forward iterator pair: that means it can't actually delete the elements, because an arbitrary iterator pair doesn't necessarily have the ability to delete elements.

Why STL list provides a member function remove and why can't the other containers just offer a remove function and be done with it? I think it's because the idiom is more efficient than other methods to remove specific values from the contiguous-memory containers.

Community
  • 1
  • 1
jfly
  • 7,715
  • 3
  • 35
  • 65
  • True i agree with the whole Scott meyers thing about remove but dont you think they should not have designed it like this in the first place.Its just unnecessary confusion to have one sequence container offer remove as a member function (list) and others dont offer it :) atleast they should have renamed remove to erase or delete in list i guess but yea we cant change the language...and ya even in the set i think its just messed up but hopefully there is some concrete reason...bcoz Java set does not offer a find function though; – Abhishek Jan 04 '14 at 07:49
  • @user3157360 *"Java set does not offer a find function"* The `SortedSet` interface does offer a find function, it is just a little twisted and called [`SortedSet.tailSet()`](http://docs.oracle.com/javase/6/docs/api/java/util/SortedSet.html#tailSet%28E%29). That's the closest to `std::set::find()`. Note that the `Set` interface provides [`Set.contains()`](http://docs.oracle.com/javase/6/docs/api/java/util/Set.html#contains%28java.lang.Object%29) if you are only interested in containment but you don't want to access the element itself. – Ali Jan 04 '14 at 16:13
  • Thanks ..so just check if the set contains the element but dont go and change the element there ..i see. – Abhishek Jan 05 '14 at 02:06