0

I have a set containing a sortable data type. I need to be able to search to find all elements that lie between two unknown points within this set. For the sake of simplicity I'll use some pseudocode to describe this.

set = {
  (a, 0),
  (a, 3),
  (a, 8),
  (b, 2),
  (b, 5),
  (c, 0)
}

In reality, my code is implemented as follows

// This is highly simplified
template<typename A, typename B>
class overall_type {

  public:
    auto find(A min_a, A max_a) {
      // Find all elements between (min_a, -oo) and (max_a, oo)
      // How can I do this?
    }

    // A large number of other implementation details

  private:
  
    struct contained_type {
      A data_a;
      B data_b;

      auto operator<(contained_type const& other) -> bool {
          if (data_a < other.data_a) {
            return true;
          }
          if (data_a > other.data_a) {
            return false;
          }
          return data_b < other.data_b;
        }
    }

    std::set<contained_type> internal_data;
}

I want to find all elements within the range (for example) (b, -oo) <= e <= (b, oo). As I have implemented comparators for my data type, the data will be placed in-order within the set, meaning that I just need to find a starting point and an ending point.

The problem I have is that I cannot be sure what the precise starting point is. In the example above, the lowest element is (b, 1), and therefore a simple find() operation won't suffice. Another constraint is that I don't know the minimum values for any of the types of elements that are contained within my data, because all the types are templates. I instead need to be able to search using incomplete data.

I have also considered implementing my own binary search, but as sets don't seem to be random accessible, I cannot easily do so whilst using one.

The only alternative I have considered is using a std::vector and implementing my own binary search over it, and adding code to insert to the correct position to keep it sorted, however the under-the-hood implmenetation of vectors means that I wouldn't meet the worst-case time complexity requirements of my project.

Apologies if I have overlooked a clear solution here - my university has taught the STL exceptionally poorly, so there may be a data structure designed for this that I simply haven't heard of and which my numerous searches didn't uncover.

How can I create a solution that matches the above criteria?

Miguel Guthridge
  • 1,444
  • 10
  • 27
  • 1
    `std::lower_bound()`, `std::upper_bound()` maybe? – lorro Jul 31 '22 at 15:19
  • Is it possible to perform a search using incomplete data though? There's no such thing as `-oo` and `oo` for strings, for example. – Miguel Guthridge Jul 31 '22 at 15:20
  • 1
    Empty string compares less than any other string. For a string that compares greater than any "reasonable" string, `"\U0010FFFF"` would probably do. – Igor Tandetnik Jul 31 '22 at 15:22
  • ... or you could just have a `variant` – lorro Jul 31 '22 at 15:23
  • Can you show some real C++ code, that shows exactly how this set is declared and used. Real `std::set`s cannot contain duplicate keys. This "pseudocode" naturally gets interpreted as a set that contains multiple "b" keys, with different values. As Mr. Spock would say: this is not logical. – Sam Varshavchik Jul 31 '22 at 15:23
  • @IgorTandetnik the problem is I have no way to tell if it's a string or not - I'm using templated types for the entire structure. – Miguel Guthridge Jul 31 '22 at 15:24
  • @SamVarshavchik apologies for the misunderstanding - it's not a key-value mapping - the entire object is the letter and one object, so `(b, 1)` and `(b, 3)` are distinct entries in the set. – Miguel Guthridge Jul 31 '22 at 15:26
  • 2
    Since C++14, `set::lower_bound` et al can do a heterogeneous search: the parameter passed to it doesn't have to be of `key_type` type. Of course the comparator must have overloads for all combinations of `key_type` and this type. This way, you can set up a new custom type and teach your comparator to put it before or after all `(b, *)` values, say. – Igor Tandetnik Jul 31 '22 at 15:27
  • Ok, so there should be a concept of a minimum/maximum value for a given data type, i.e. `INT_MIN`, `INT_MAX`. So, you should be able to use that together with lower_bound/upper_bound. – Sam Varshavchik Jul 31 '22 at 15:27
  • @IgorTandetnik that's actually quite helpful, thanks! I'll refine my question a bit (adding some code to clarify it), then I'll see if that helps – Miguel Guthridge Jul 31 '22 at 15:31
  • @SamVarshavchik I've edited in some C++ code. It's highly simplified from the actual implementation, but it should give the gist – Miguel Guthridge Jul 31 '22 at 15:34
  • 1
    Your comparator doesn't meet the requirements of strict weak ordering. It declares that `{1, 2} < {2, 1}` and also that `{2, 1) < (1, 2)` – Igor Tandetnik Jul 31 '22 at 15:43
  • Yep that one's a bug in my code, will fix – Miguel Guthridge Jul 31 '22 at 15:43
  • 1
    Ok, so this is a template. Then, yes, a transparent comparator, and C++14, is the way to go. After fixing the existing, broken, comparator, of course. – Sam Varshavchik Jul 31 '22 at 15:44
  • [Something along these lines](https://godbolt.org/z/Ynvqx5G4x) – Igor Tandetnik Jul 31 '22 at 15:53

2 Answers2

2

Something along these lines, perhaps. This relies on the capability, added in C++14, of std::set::equal_range et al to perform heterogeneous searches, given a comparator that has overloads for combinations of different types.

template<typename A, typename B>
class overall_type {

  public:
    auto find(A min_a, A max_a) {
        return internal_data.equal_range(RangeCover{min_a, max_a});
    }    

  private:
    using contained_type = std::pair<A, B>;
  
    struct RangeCover {
        A& min_a;
        A& max_a;
    };

    struct Compare {
        using is_transparent = int;

        bool operator()(contained_type const& l, contained_type const& r) const {
            return l < r;
        }

        bool operator()(contained_type const& l, RangeCover const& r) const {
            return l.first < r.min_a;
        }

        bool operator()(RangeCover const& l, contained_type const& r) const {
            return l.max_a < r.first;
        }
    };

    std::set<contained_type, Compare> internal_data;
};

Demo

Compare makes an instance of RangeCover compare equivalent to all instances of contained_type that have the first component in the range, ignoring the second one.

Igor Tandetnik
  • 50,461
  • 4
  • 56
  • 85
  • Just for clarification, what does the `using is_transparent = int;` statement do? – Miguel Guthridge Jul 31 '22 at 16:18
  • I'm not entirely sure, I must admit. The heterogeneous comparison feature requires that the comparator be *transparent*, which it signals by providing `is_transparent` typedef (the actual type doesn't matter, just the existence). What this property means and why it's needed, I don't fully understand. – Igor Tandetnik Jul 31 '22 at 16:19
  • Perhaps [a related article that I am not smart enough to comprehend](http://www.fluentcpp.com/2017/06/09/search-set-another-type-key/) – Miguel Guthridge Jul 31 '22 at 16:21
  • That article just repeats my answer (or rather, my answer repeats the article, but I swear I wasn't aware of it). It explains that `is_transparent` typedef is needed, but still doesn't explain why. Or rather, it's clear why - the code doesn't compile without it, because templated versions of `equal_range` et al are not defined. What I don't fully understand is why the standard authors required this extra step, why they couldn't just have templated versions defined always. – Igor Tandetnik Jul 31 '22 at 16:29
  • 2
    See [this question](https://stackoverflow.com/questions/20317413/what-are-transparent-comparators). Apparently, just opening the floodgates could have broken existing code(I guess, code that called `find(x)` and relied on `x` getting implicitly converted to the key type), so the heterogeneous comparison feature was made opt-in. – Igor Tandetnik Jul 31 '22 at 16:39
1

The following solution is not optimal in performance if you have large numbers of equal data_a values, but it is simple and does not require anything special for type B other than that it is default constructible. If you add transparent comparison to allow lower_bound() to work directly with A alone, it won't even require B to be default constructible.

auto find(A min_a, A max_a) {
  // Find all elements between (min_a, -oo) and (max_a, oo)
  contained_type min_pair{min_a}; // data_b is default constructed
  auto iter = internal_data.lower_bound(min_pair);

  // linear search backward for true begin point
  auto begin = iter;
  for (; begin != internal_data.begin(); --begin) {
    if (begin->data_a < min_a) {
      ++begin;
      break;
    }
  }

  // linear search forward for true end point
  // [left as an exercise for the reader]
}
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • This looks good for simpler cases, but it doesn't appear to scale very well to cover additional complexity that is a part of my overall problem. Thanks anyway! – Miguel Guthridge Jul 31 '22 at 16:07