0

How to check if a value is in a on-the-fly created set. I'm looking for some syntactic sugar, like we have in python

if s in set(['first','second','third','fourth']):
    print "It's one of first,second,third,fourth";

How can it be done efficiently in C++?

pawel_j
  • 419
  • 4
  • 14
  • 6
    Can you explain what's wrong with [`find`](https://en.cppreference.com/w/cpp/container/set/find)? – EdChum Nov 16 '18 at 16:14
  • sorry, I've just corrected my code to show that I want the most optimal way. I'm looking for something that can be done in one line - in the if. – pawel_j Nov 16 '18 at 16:15
  • 14
    @pawel_j Almost anything can be done in one line in c++ if your line is long enough. – François Andrieux Nov 16 '18 at 16:16
  • 4
    "I want the most optimal way. I'm looking for something that can be done in one line"... one thing has nothing to do with the other. –  Nov 16 '18 at 16:18
  • 1
    std::set::count method is a way to go, if you have rvalue of type set. – sandyre Nov 16 '18 at 16:21
  • If you want to write as little as possible, why is your Python not `if s in ['first','second','third','fourth']:`? – molbdnilo Nov 16 '18 at 16:21

7 Answers7

5

How about this:

std::string s = "first";
if(std::set<std::string>{"first","second","third","fourth"}.count(s)>=1){
    std::cout << s << " is found" << std::endl;
}

BTW, in C++20 and over I think std::set::contains is more preferable.

Hiroki
  • 2,780
  • 3
  • 12
  • 26
  • Why not `unordered_set`? – scohe001 Nov 16 '18 at 16:20
  • 5
    @scohe001 For a set of 4 elements, you would have to measure the overheads to see which is better. Even an `std::vector` or `std::array` might be a contender at that number of elements. – François Andrieux Nov 16 '18 at 16:22
  • 2
    Alternatively, in C++20 `.contains(s)` can be used. – HolyBlackCat Nov 16 '18 at 16:22
  • 1
    @FrançoisAndrieux meh fair enough. – scohe001 Nov 16 '18 at 16:23
  • count() will iterate over all elements, and is not an optimization. More like @HolyBlackCat 's solution - contains() is an optimization - but not present before C++20 (that's why I asked question, because using .find() is not possible in one liner, as you need to use the reference to the set twice). – pawel_j Nov 19 '18 at 09:11
  • 2
    @pawel_j `count()` will not iterate over all elements. For plain `std::set` (as opposed to `std::multiset`) it's no different from `.contains()` (except one looks prettier than the other). – HolyBlackCat Nov 19 '18 at 09:30
  • @pawel_j Thx for your performance testing. Yes, initial rebalance of `std::set` is usually expensive. And now I understand you want *really* high performance syntax sugar. – Hiroki Nov 19 '18 at 17:34
5

If you want this to be efficient then you are not going to want to construct a container just to check if the value exists. What we can do is leverage C++17's fold expressions and write a function like

template<typename... Args, std::enable_if_t<std::conjunction_v<std::is_same<const char*, Args>...>, bool> = true>
bool string_exists(std::string to_find, Args... args)
{
    return ((to_find == args) || ...);
}

which then lets you write the if statement like

int main()
{
    std::string element;
    // get value of element somehow
    if (string_exists(element, "1", "2", "3", "4"))
        std::cout << "found";
}

And now no container and no std::string objects are created. If you want to accept string objects you could change the functions to

template<typename... Args, std::enable_if_t<std::conjunction_v<std::is_same<const char*, Args>...> ||
                                            std::conjunction_v<std::is_same<std::string, Args>...>, bool> = true>
bool string_exists(std::string to_find, Args... args)
{
    return ((to_find == args) || ...);
}

int main()
{
    std::string element;
    // get value of element somehow
    if (string_exists(element, some, other, string, variables))
        std::cout << "found";
}

Do note that the last example doesn't allow you to mix string literals with std::string's.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • That's quite a good solution with the template folding, but not really an optimization when the number of the possible values is larger, providing that set was created once and later reused. Please see my answer below – pawel_j Nov 19 '18 at 09:22
3

I do not see anything wrong in using find method

std::set<std::string>aset {"first", "second", "third", "fourth"}; 
std::string s = "third";

if(aset.find(s) != aset.end()){
    std::cout << "It's one of first,second,third,fourth";
}
haccks
  • 104,019
  • 25
  • 176
  • 264
1

Something like this should work:

#include <iostream>
#include <string>
#include <unordered_set>

void foo(const std::string& s) {
    if (std::unordered_set<std::string>{"1", "2", "3", "4"}.count(s) != 0) {
        std::cout << "It's one of 1,2,3,4\n";
    }
}

In C++17 you can let deduction guides come into play to clean up the template argument (note that I need to use std::string literals to avoid deducing as const char*):

using namespace std::string_literals;
if (std::unordered_set{"1"s, "2"s, "3"s, "4"s}.count(s) != 0) { ... }
0x5453
  • 12,753
  • 1
  • 32
  • 61
1

How can it be done efficiently in C++? [...] I'm looking for something that can be done in one line - in the if

In "one line - in the if" and "efficiently" are different things.

If you really want all in a line, I suggest the 0x5453's solution (also the Hiroki's one) that uses count.

I propose an alternative based on emplace() (but works also with insert())

if ( ! std::set<std::string>{"first", "second", "third", "fourth"}.emplace(s).second )
   std::cout << "It's one of first,second,third,fourth" << std::endl; 

but I don't think it's very efficiently: think if you use it in a loop: every time you have to create a set and insert a new element; so you have to recreate the same set and insert another element.

It's better the solution based on count() but just because the compiler can optimize the code avoiding to recreate the set every time.

If you want efficiency, I suggest the haccks's solution (making also const the set, but I suspect that the compiler optimize anyway)

 std::set<std::string> const numSet {"first", "second", "third", "fourth"}; 

 if ( numSet.find(s) != numSet.cend() )
    std::cout << "It's one of first,second,third,fourth" << std::endl;

or, better, avoid the std::set at all ad check using template folding, as suggested by NathanOliver (you need C++17 but, maybe loosing short circuiting, isn't really difficult adapt it in C++14).

max66
  • 65,235
  • 10
  • 71
  • 111
0

I've been trying to check your solutions, using a benchmarking tool to see the outcomes. It seems that creating a set in-situ is a performance killer, that's why one should avoid such solution, which is otherwise quite fast in python (need to do performance tests in python too).

So, when using a pre-created set it's almost twice as fast as using multiple ifs. Test done for number of items = 10.

enter image description here

For items = 4, set is only 18% faster than multiple ifs. enter image description here

When creating the set in every if, the set code is almost 900% slower.

enter image description here

Link to the benchmark http://quick-bench.com/iqQJWxaRgQGh4Sry2YtfxdSqTio (but it might disappear so I also add the code tested

static void withSet(benchmark::State& state) {
  std::vector<std::string> l = {"1","2","9","10"};
  int i = 0;
  std::set<std::string> s = {"1","2","3","4","5","6","7","8","9","10"};

  for (auto _ : state) {
    const std::string t = l[i++%l.size()];
    if (s.find(t) != s.end() ) {
      benchmark::DoNotOptimize(t);
    }
  }
}
// Register the function as a benchmark
BENCHMARK(withSet);

static void withIfs(benchmark::State& state) {
  std::vector<std::string> l = {"1","2","9","10"};
  int i = 0;

  for (auto _ : state) {
    const std::string t = l[i++%l.size()];
    if (t == "1" || t == "2" || t == "3" || t == "4" ||
    t == "5" || t == "6" || t == "7" || t == "8" ||
    t == "9" || t == "10"
    ) {
      benchmark::DoNotOptimize(t);
    }
  }
}
BENCHMARK(withIfs);

--- edit ---

Additionaly I understood that count() in set is not equal iterating every element in set. This is the outcome of benchmark between find() != end() and count()

enter image description here

pawel_j
  • 419
  • 4
  • 14
0

I post an alternative approach which is not a true one liner but shows good performance as tested bellow. The basic idea is same with @NathanOliver. The idea is initializing a data set {"first", ..., "fourth"} at a compile-time with static constexpr qualifier to get a run-time performance gain:

  • In C++17 and over, std::string_view is available and this has constexpr ctors. Thus it is natural to apply static constexpr std::array of std::string_view to improve the performance.

  • In addition, for such a small size data set ~10, naive linear search with cache locality of contiguous arrays sometimes beat other algorithms. So I deliberately use std::find here.

This is easily done using C++17's if statement with initializer as follows:

if(static constexpr 
   std::array<std::string_view, 4>
   v = {"first", "second", "third", "fourth"};
   std::find(v.cbegin(), v.cend(), s) != v.cend())

This can be summarized as the following compact and portable syntax sugar macro:

#include <tuple>
#include <array>
#include <string_view>
#include <algorithm>

#define IF_CONTAINS(str, ...)                                          \
if(static constexpr                                                    \
   std::array<                                                         \
       std::string_view,                                               \
       std::tuple_size<decltype(std::make_tuple(__VA_ARGS__))>::value> \
   v = {__VA_ARGS__};                                                  \
   std::find(v.cbegin(), v.cend(), str) != v.cend())

This macro IF_CONTAINS works fine as if-statement as follows:

Live DEMO

std::string s = "first";

IF_CONTAINS(s, "first", "second", "third", "fourth") {
    std::cout << s << " is found" << std::endl;
}

Performance Test 1

First, we test the performance of this approach with the above data set {"first", "second", "third", "fourth"}. Tests are done with Quick C++ Benchmark, gcc-8.2, C++17 and O3 optimization. The results are as follows:

  • const std::string s = "first"; (Live DEMO)

    1.1 times slower than naive implementation.

enter image description here

  • const std::string s = "second"; (Live DEMO)

    3.7 times faster than naive implementation.

enter image description here

  • const std::string s = "third"; (Live DEMO)

    1.9 times faster than naive implementation.

enter image description here

  • const std::string s = "fourth"; (Live DEMO)

    3.1 times faster than naive implementation.

enter image description here


Performance Test 2

Finally, we test the performance with a data set {"1", ...,"10"}. The test code is same with @pawel_j 's one. This test shows 2.7 times faster result than naive implementation:

Live DEMO

enter image description here

Hiroki
  • 2,780
  • 3
  • 12
  • 26