1

Consider I have a set of intervals described by [min, max] pairs. I want to find all sets of intervals which contain a given value.

I made this solution which works in O(n) time and serves as an example of what I'm after:

#include <iostream>
#include <vector>

using namespace std;

struct data
{
    int minValue;
    int maxValue;
};

void searchValue(const vector<data> &v, int target)
{
    for(size_t i = 0; i < v.size(); i++)
    {
        if(target >= v[i].minValue && target <= v[i].maxValue)
            cout << "Found target at set " << i << " (" << v[i].minValue << "," << v[i].maxValue << ")" << endl;
    }
}

int main()
{
    data d1 = {-1, 0};
    data d2 = {0, 3};
    data d3 = {-1, 5};
    data d4 = {10, 11};
    data d5 = {9, 12};

    // Vector to hold the interval sets
    vector<data> v{d1, d2, d3, d4, d5};

    // Search for the interval sets which contain the number 2
    searchValue(v, 2);

    return 0;
}

It's very important I find all possible sets, not just one.

I'm now looking for a solution which is more computationally efficient, possibly in logarithmic time. I think there may be a solution with multiset/multimap, lower_bound/upper_bound, etc., but I haven't had any success so far.

This can be achieved using interval trees but I believe there might a solution using just the STL.

nihilscientia
  • 21
  • 1
  • 4
  • you need the intervals to be in some order to achieve log time – UmNyobe Oct 06 '16 at 06:14
  • Looks like it really depends if you do more inserts or lookups as to whether you can do better than O(n) – xaxxon Oct 06 '16 at 06:19
  • @xaxxon I'm only worried about lookup, data is immutable after being acquired. – nihilscientia Oct 06 '16 at 06:23
  • Well, then your answer awaits you below. – xaxxon Oct 06 '16 at 06:25
  • Depending on your data you could store a reference/pointer/copy of the data-object in a map for each position given through the ranges. This might lead to a memory-problem yet the search will be an easy lookup. [See here](http://coliru.stacked-crooked.com/a/1919052da03e2a73) – Simon Kraemer Oct 06 '16 at 11:11
  • @SimonKraemer very nice solution in constant time! Simon, consider submitting your solution as an answer so I can upvote it. – nihilscientia Oct 06 '16 at 16:56

3 Answers3

3

With STL it seems difficult.

You can use an interval tree:

http://en.wikipedia.org/wiki/Interval_tree

Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point.

xaxxon
  • 19,189
  • 5
  • 50
  • 80
v78
  • 2,803
  • 21
  • 44
1

I was going to simply reply with "solved!" but then I remembered about xkcd's Wisdom of the Ancients.

STL's set and multiset are tipically implemented as red-black trees. Interval trees can be implemented on top of red-black trees. This got me thinking STL's multiset could be used as a part of the solution. Multiset is necessary, instead of set, to handle key multiplicity.

So the first step is using multiset as a container for intervals. Internally, elements in multiset are sorted so we need to compare intervals. We can say an intervalBig is greater than another intervalSmall if intervalBig fully contains intervalSmall.

By this point our solution is a multiset containing sorted intervals. Now we need a way to search this container. This is where STL's find_if comes into play. To get it working we just need our intervals to be able to compare themselves against each other.

To handle multiple solutions all that should be required is to iterate the previous solution given by find_if, since multiset is ordered.

Here is a solution in C++11:

#include <iostream>
#include <set>
#include <algorithm>

using namespace std;

struct data
{
    int minValue;
    int maxValue;

    // for find_if comparisons
    bool operator()(const data& rhs){
        return rhs.minValue <= this->minValue && rhs.maxValue >= this->maxValue;
    }
};

// for set's internal sorting
struct compareData
{
    // Checks if lhs <= rhs. For an interval this can be defined as rhs being a bounding box that contains lhs.
    bool operator()(const data& lhs, const data& rhs){
        return rhs.minValue <= lhs.minValue && rhs.maxValue >= lhs.maxValue;
    }
};

int main()
{
    data d1 = {-1, 0};
    data d2 = {0, 3};
    data d2dup = {0, 3};
    data d3 = {-1, 5};
    data d4 = {10, 11};
    data d5 = {9, 12};

    std::multiset<data, compareData> intervals = { d1, d2, d3, d4, d5, d2dup };

    double target = 0;
    data tmp = { target, target };

    // find all sets which contain target
    for (auto it = find_if(intervals.begin(), intervals.end(), tmp); it != intervals.end(); it = find_if(++it, intervals.end(), tmp))
        cout << "Found target at set (" << it->minValue << "," << it->maxValue << ")" << endl;
}

This is effectively an interval tree in C++ with overlapping search. Search time should be O(logn) since multiset is ordered.

nihilscientia
  • 21
  • 1
  • 4
  • 1
    @SimonKraemer thanks for the help! here is my solution. – nihilscientia Oct 06 '16 at 19:10
  • @xaxxon could you edit my answer and include xkcd's picture? It wouldn't allow me to. – nihilscientia Oct 06 '16 at 19:10
  • I believe that ```find_if``` doesn't know about underlying data structure meaning that it would not utilize the fact that multiset (or similar) is a tree - making the search time linear rather than O(logN). Details can be found here http://www.cplusplus.com/reference/algorithm/find_if/ – Alex Bailo Mar 26 '18 at 05:20
0

I'm only worried about lookup, data is immutable after being acquired

I have 2 possible solutions on how to initially prepare the data so you can have a pretty fast lookup speed.

The first is using a the given position as key in a prepared std::map.

The second one should be slower in creation, yet provides direct access to the wanted data in an std::vector by calculating the index.

I haven't done any measurements in speed and both examples are not optimized in terms of memory management.


Ex1:

#include <vector>
#include <map>
#include <algorithm>
#include <iostream>

struct data
{
    int minValue;
    int maxValue;
};

class intervall_container
{
    using vec_t = std::vector<data>;
    using map_t = std::map<int, vec_t>;

    map_t m_map;

    void insert(const data& d)
    {
        for (int i = d.minValue; i <= d.maxValue; ++i) {
            m_map[i].push_back(d);
        }
    }

public:
    intervall_container(std::initializer_list<data> init)
    {
        std::for_each(init.begin(), init.end(), [this](const data& d) {insert(d); });
    }
    intervall_container(const intervall_container&) = delete;
    ~intervall_container() {}

    vec_t searchValue(int pos) const
    {
        auto result = m_map.find(pos);
        if (result == m_map.end()) return vec_t();
        return result->second;
    }
};

int main()
{
    data d1 = { -1, 0 };
    data d2 = { 0, 3 };
    data d3 = { -1, 5 };
    data d4 = { 10, 11 };
    data d5 = { 9, 12 };

    // Vector to hold the interval sets
    intervall_container v{ d1, d2, d3, d4, d5 };

    // Search for the interval sets which contain the number 2
    int value = 2;
    auto result = v.searchValue(value);
    for (auto iter = result.cbegin(); iter != result.cend(); ++iter)
    {
        std::cout << "Value '" << value << "' lies within the bounds of DataRange(" << iter->minValue << ", " << iter->maxValue << ").\n";
    }
    return 0;
}

Ex2:

#include <vector>
#include <algorithm>
#include <iostream>
#include <limits>

struct data
{
    int minValue;
    int maxValue;
};


struct intervall_container
{
    using vec_t = std::vector<data>;

    int m_offset;
    std::vector<vec_t> m_vecs;  

public:
    intervall_container(std::initializer_list<data> init)
    {
        int minmin = std::numeric_limits<int>::max(); //min minValue
        int maxmax = std::numeric_limits<int>::min(); //max maxValue
        for (auto iter = init.begin(); iter != init.end(); ++iter)
        {
            if (iter->minValue < minmin) minmin = iter->minValue;
            if (iter->maxValue > maxmax) maxmax = iter->maxValue;
        }
        m_offset = minmin;
        for (int value = minmin; value < maxmax; ++value)
        {
            vec_t vec;
            for (auto iter = init.begin(); iter != init.end(); ++iter)
            {
                if (iter->minValue <= value && value <= iter->maxValue)
                {
                    vec.push_back(*iter);
                }
            }
            m_vecs.push_back(vec);
        }
    }
    intervall_container(const intervall_container&) = delete;
    ~intervall_container() {}

    vec_t searchValue(int pos) const
    {
        pos -= m_offset;
        if(pos < 0 || pos >= m_vecs.size()) return vec_t();
        return m_vecs[pos];
    }
};


int main()
{
    data d1 = { -1, 0 };
    data d2 = { 0, 3 };
    data d3 = { -1, 5 };
    data d4 = { 10, 11 };
    data d5 = { 9, 12 };

    // Vector to hold the interval sets
    intervall_container v{ d1, d2, d3, d4, d5 };

    // Search for the interval sets which contain the number 2
    int value = 2;
    auto result = v.searchValue(value);
    for (auto iter = result.cbegin(); iter != result.cend(); ++iter)
    {
        std::cout << "Value '" << value << "' lies within the bounds of DataRange(" << iter->minValue << ", " << iter->maxValue << ").\n";
    }
    return 0;
}
Simon Kraemer
  • 5,700
  • 1
  • 19
  • 49
  • 1
    +1 great answer, Simon! Thanks for the help. Your solution should provide O(1) access time at a trade-off for memory, restricted to integer intervals. I believe you could change Ex1 map+vector for a multimap? I posted a different solution which creates an interval tree on top of std::multiset, access time is O(logn) but requires less memory. – nihilscientia Oct 07 '16 at 16:30