9

I am using a std::map. Sometimes I will do an operation like: finding the median value of all items. e.g if I add

1 "s"
2 "sdf"
3 "sdfb"
4 "njw"
5 "loo"

then the median is 3.

Is there some solution without iterating over half the items in the map?

A. Levy
  • 29,056
  • 6
  • 39
  • 56
Raymond
  • 561
  • 1
  • 4
  • 14
  • How many elements are in the map? How often do you need to retrieve the median? If the first or second answers is a *small number* then just use `std::advance( m.begin(), m.size()/2 )`. The cost of iterating over the map in order is linear, so that will take `O(N)` operations. With each operation being relatively cheap. – David Rodríguez - dribeas Aug 10 '10 at 07:39

10 Answers10

9

I think the answer is no. You cannot just jump to the N / 2 item past the beginning because a std::map uses bidirectional iterators. You must iterate through half of the items in the map. If you had access to the underlying Red/Black tree implementation that is typically used for the std::map, you might be able to get close like in Dani's answer. However, you don't have access to that as it is encapsulated as an implementation detail.

Community
  • 1
  • 1
A. Levy
  • 29,056
  • 6
  • 39
  • 56
9

I think you can solve the problem by using two std::map. One for smaller half of items (mapL) and second for the other half (mapU). When you have insert operation. It will be either case:

  • add item to mapU and move smallest element to mapL
  • add item to mapL and move greatest element to mapU

In case the maps have different size and you insert element to the one with smaller number of elements you skip the move section. The basic idea is that you keep your maps balanced so the maximum size difference is 1 element. As far as I know STL all operations should work in O(ln(n)) time. Accessing smallest and greatest element in map can be done by using iterator. When you have n_th position query just check map sizes and return greatest element in mapL or smallest element in mapR.

The above usage scenario is for inserting only but you can extend it to deleting items as well but you have to keep track of which map holds item or try to delete from both.

Here is my code with sample usage:

#include <iostream>
#include <string>
#include <map>
using namespace std;

typedef pair<int,string> pis;
typedef map<int,string>::iterator itis;

map<int,string>Left;
map<int,string>Right;

itis get_last(map<int,string> &m){
    return (--m.end());
}

int add_element(int key, string val){
    if (Left.empty()){
        Left.insert(make_pair(key,val));
        return 1;
    }

    pis maxl = *get_last(Left);
    if (key <= maxl.first){
        Left.insert(make_pair(key,val));
        if (Left.size() > Right.size() + 1){
            itis to_rem = get_last(Left);
            pis cpy = *to_rem;
            Left.erase(to_rem);
            Right.insert(cpy);
        }
        return 1;
    } else {
        Right.insert(make_pair(key,val));
        if (Right.size() > Left.size()){
            itis to_rem = Right.begin();
            pis cpy = *to_rem;
            Right.erase(to_rem);
            Left.insert(*to_rem);
        }
        return 2;
    }   
}

pis get_mid(){
    int size = Left.size() + Right.size();
    if (Left.size() >= size / 2){
        return *(get_last(Left));
    }
    return *(Right.begin());
}

int main(){
    Left.clear();
    Right.clear();

    int key;
    string val;
    while (!cin.eof()){
        cin >> key >> val;
        add_element(key,val);
        pis mid = get_mid();
        cout << "mid " << mid.first << " " << mid.second << endl;
    }
}
jethro
  • 18,177
  • 7
  • 46
  • 42
  • +1 this is a pretty clever solution! You could easily build on this and encapsulate it in a class that could be used as a replacement for `std::map` but with an additional `median()` method. Good thinking! – A. Levy Aug 10 '10 at 19:46
4

Try:

typedef std::map<int,std::string>  Data;
Data           data;
Data::iterator median = std::advance(data.begin(), data.size() / 2); 

Works if the size() is odd. I'll let you work out how to do it when size() is even.

Martin York
  • 257,169
  • 86
  • 333
  • 562
  • 2
    This is the cleanest way to get the median. However, you are still iterating through half of the elements because std::map::iterator is a bidirectional iterator and incapable of random access. – A. Levy Aug 10 '10 at 06:21
  • Yep. But not much choice when your data structure is map. But if the data structure is ever changed to any other container type the std::advance() will still work and automatically provide the most efficient method of advancing. – Martin York Aug 10 '10 at 06:24
  • This equals iter = map.begin(); while ( i < N/2) ++iter; the performance is poor. – Raymond Aug 10 '10 at 06:28
  • Martin, I agree whole-heartedly! You solution is the first thing I would try if I had to use a map. I just had to be pedantic and point out that the solution doesn't quite satisfy the OP's criteria. – A. Levy Aug 10 '10 at 06:31
  • 1
    @Raymond: Poor compared to what. The complexity is O(n) and unless you use vector you are not going to get better. – Martin York Aug 10 '10 at 06:34
  • @Raymond: Poor is a comparative term. There's no other way. Have you actually *profiled* anything and determined this method will be "too slow"? If not, just get your code working in a straightforward fashion first. If your profiling results you spend most of your time here, cache the value for an approximation, or try different structures (like `vector`). – GManNickG Aug 10 '10 at 08:47
  • I am sorry, I just mean "iterate the former half items" is my first solution, and through profiling, i find i should make a optimization. – Raymond Aug 12 '10 at 14:02
2

In self balancing binary tree(std::map is one I think) a good approximation would be the root.
For exact value just cache it with a balance indicator, and each time an item added below the median decrease the indicator and increase when item is added above. When indicator is equal to 2/-2 move the median upwards/downwards one step and reset the indicator.

Daniel
  • 30,896
  • 18
  • 85
  • 139
  • Good idea. Most of time, approximation is ok. This means i should rewrite map, and keep indicator? if adjust indicator in each insertion operation, that is inefficient, because there will be map.size() comparison. – Raymond Aug 10 '10 at 06:26
  • You dont have to rewrite map, you can subclass it. and why inefficient? this is O(1) method on instertion. – Daniel Aug 10 '10 at 15:23
  • you are right, i can subclass it, and set a member variable to save the indicator. thank you, this is also an optional solution. – Raymond Aug 12 '10 at 14:04
  • STL classes aren't designed to be subclassed. See http://stackoverflow.com/questions/1647298 – Tom Sirgedas Aug 16 '10 at 15:04
2

If you can switch data structures, store the items in a std::vector and sort it. That will enable accessing the middle item positionally without iterating. (It can be surprising but a sorted vector often out-performs a map, due to locality. For lookups by the sort key you can use binary search and it will have much the same performance as a map anyway. See Scott Meyer's Effective STL.)

Daniel Earwicker
  • 114,894
  • 38
  • 205
  • 284
  • sorry, i cannot switch to vector, because items in map is inserted dynamicly. – Raymond Aug 10 '10 at 06:21
  • 1
    You can `push_back` on the vector, and then sort later. Try timing it - the total cost of all the operations may be less. – Daniel Earwicker Aug 10 '10 at 06:22
  • 2
    BTW. there's a nice function `nth_element` with sorts RandomAccessIterator's until the first n element are sorted. You'd find the median median with `nth_element(v.begin(),v.begin+v.size()/2,v.end());` – Nordic Mainframe Aug 10 '10 at 06:34
  • @Raymond You cant compute the median of a non sorted set of values. The term is only defined for sorded values. So you have to sort your map anyway... – InsertNickHere Aug 10 '10 at 06:43
  • @InsertNickHere A map _is_ sorted. – Alice Purcell Aug 10 '10 at 09:06
  • So why do you all talk about sorting? :) – InsertNickHere Aug 10 '10 at 10:31
  • 1
    The reason sorting is relevant is because this answer suggests using a `vector` instead of a `map`. A `vector` can be put into a sorted state (it has a `sort` method) but then will likely become unsorted as soon as you modify its contents. So if you use a `vector`, you lose automatic sorting, but you gain (a) control over when and how sorting is done, (b) constant-time access to the median position, (b) greater locality: your data in a contiguous chunk instead of scattered around the address space, which often improves performance. – Daniel Earwicker Aug 10 '10 at 14:15
  • Getting median will be called only several times, but inserting is common case, so i vector is not a suitable container. – Raymond Aug 12 '10 at 14:07
  • @Raymond - I may be misunderstanding you, but if you insert many times *without* yet requiring the vector to be sorted, then you don't have to sort after every insert. You can do a single sort when you need the median. As as Luther Blissett pointed out, you don't even need to fully sort it. – Daniel Earwicker Aug 12 '10 at 15:37
  • my scenario is: insert item and query item frequently, so i must ensure the items is sorted at any time. and sometimes, i need retrive median value of the whole items. thank you. – Raymond Aug 15 '10 at 06:44
1

If you know the map will be sorted, then get the element at floor(length / 2). If you're in a bit twiddly mood, try (length >> 1).

Zach Rattner
  • 20,745
  • 9
  • 59
  • 82
1

I know no way to get the median from a pure STL map quickly for big maps. If your map is small or you need the median rarely you should use the linear advance to n/2 anyway I think - for the sake of simplicity and being standard.

You can use the map to build a new container that offers median: Jethro suggested using two maps, based on this perhaps better would be a single map and a continuously updated median iterator. These methods suffer from the drawback that you have to reimplement every modifiying operation and in jethro's case even the reading operations.

A custom written container will also do what you what, probably most efficiently but for the price of custom code. You could try, as was suggested to modify an existing stl map implementation. You can also look for existing implementations.

There is a super efficient C implementation that offers most map functionality and also random access called Judy Arrays. These work for integer, string and byte array keys.

Peter G.
  • 14,786
  • 7
  • 57
  • 75
1

Since it sounds like insert and find are your two common operations while median is rare, the simplest approach is to use the map and std::advance( m.begin(), m.size()/2 ); as originally suggested by David Rodríguez. This is linear time, but easy to understand so I'd only consider another approach if profiling shows that the median calls are too expensive relative to the work your app is doing.

Mark B
  • 95,107
  • 10
  • 109
  • 188
0

The nth_element() method is there for you for this :) It implements the partition part of the quick sort and you don't need your vector (or array) to be sorted. And also the time complexity is O(n) (while for sorting you need to pay O(nlogn)).

Master Yoda
  • 587
  • 3
  • 7
-1

For a sortet list, here it is in java code, but i assume, its very easy to port to c++:

    if (input.length % 2 != 0) {
        return input[((input.length + 1) / 2 - 1)];
    } else {
        return 0.5d * (input[(input.length / 2 - 1)] + input[(input.length / 2 + 1) - 1]);
    }
InsertNickHere
  • 3,616
  • 3
  • 26
  • 23