3

I would like to pass small arrays of variable size as parameters in modern C++, ie:

func({1,2,3,4});

And it should be as fast as possible, ideally without heap allocation.

Things I've tried:

C style array

void func(int * arr, int arrayCount) {
   for (int i = 0; i < arrayCount; i++) {
      //use arr[i];
   }
}

int arr[] = {1,2,3,4};
func(arr, 4);

This is fast and efficient, but adds an extra arrayCount variable which is prone to user error, and the function call is now broken into two lines.

std vector

void func(vector<int> arr) {
    for (int &a : arr) {
      //use a
    }
}

func({1,2,3});

This is very clean but it's remarkably slow due to copying data over to the heap. Changing the signature to use a universal ref:

void func(vector<int> && arr) {

Doesn't seem to make any difference.

std initializer_list

void func(initializer_list<int> arr) {
    for (int &a : arr) {
      //use a
    }
}

func({1,2,3});

This is a 50x speed improvement over the vector (approximately)! Using a && again makes no difference, but it is still (approximately) 5-10x slower than the C style syntax due to the overhead of initializer_list creation and iteration. Still acceptable for many use cases, but the overhead seems a little unnecessary.

Thinking maybe compilers had gotten smarter, I tried the following:

std array

template <int arrayCount>
void func(array<int, arrayCount> arr) {

  for (int &a : arr) {
    //use a
  }

}

The compiler can not infer the size from the call, so it is necessary to write

func<3>({1,2,3});

And in any case, this is no faster than the initializer list.

Variadic templates are out of the question for a number of reasons, because they don't support multiple array parameters and the syntax is difficult to navigate.

Is there a way to get both clean syntax and fast performance? It seems like it would be easy to add some syntactic sugar to wrap the C style array notation.


Clarification:

These tests were performed in MSVC 2017 with default optimization (O2):

timer.start();
for(int i = 0; i < 100000; i++) {
   func({i, i+2, i+3, i+4});
}
timer.stop();

or for the c style:

for (int i = 0; i < 100000; i++) {
   int arr[] = {i, i+2, i+3, i+4};
   func(arr, 4);
}

With the passed in values being added to a static variable.


Review:

See Robert's answer below for the syntax for passing arrays using generics that avoids STL and has similar performance to the C style option, but is limited in its ability to handle empty arrays.

My take away is that the initalizer_list notation is probably fast enough in most use cases, the vector notation should be avoided, and when performance is absolutely critical, consider the generic array approach or C style arrays depending on the use case. MSVC doesn't optimize these as well as GCC/Clang.

Revillo
  • 41
  • 1
  • 4
  • 9
    where are your numbers coming from? tbh differences of the order of x50 smells like no compiler optimizations – 463035818_is_not_an_ai Apr 25 '18 at 15:50
  • 6
    Crazy multiples such as 50x and even 10x are an indication that you are likely comparing functions that do nothing with the data. In essence, you are comparing the timing of passing a pointer to the timing of passing an array by value. – Sergey Kalinichenko Apr 25 '18 at 15:54
  • 3
    Please post a [mcve], so others can test and include your *exact* compiler commandline. – Jesper Juhl Apr 25 '18 at 15:55
  • Related: https://stackoverflow.com/questions/40241370/deduce-template-argument-for-size-of-initializer-list – Max Langhof Apr 25 '18 at 16:24
  • These tests were with compiler optimizations in visual studio. I am interested only in the costs associated with passing the arrays in differently, and not the costs associated with processing them. The vector implementation is so bad it actually slows down my game loop. – Revillo Apr 25 '18 at 17:27
  • 1
    @Revillo: Your test is invalid, and the C version appears to be taking advantage of that to remove the code. – Mooing Duck Apr 25 '18 at 21:35
  • Yes for the C test, use the C syntax as described in the post. – Revillo Apr 26 '18 at 14:06
  • 1
    `std::initializer_list` is not slower than c-style array under GCC or Clang, see [this benchmark](http://quick-bench.com/DT5ArJ4AI_fbvIuCcvwSIi0l1BA). I guess either you did not use the release mode, or MSVC does not fully optimize the code. – xskxzr Apr 26 '18 at 15:19
  • @xskxzr the benchmark is also timing the external loop. Inside the benchmark loop leave only the function call. – Robert Andrzejuk Apr 26 '18 at 18:09

3 Answers3

3

Try this :

template< typename T, size_t N>
void func( T const (&arr)[ N ]  )
{...}

void func() // overloaded function for empty parameter
{...}

The compiler will automatically deduce the type and array size from the array passed into the function. The array has to be a complete type - not a pointer and dimension has to be known at call location.


Edit: If a function with no parameters cannot be used, then tag-dispatching should be considered:

class Empty_List {};

static constexpr Empty_List empty;

void func( Empty_List ) 
{ ... }

which should be called:

 func( empty ); // option 1
 func( {} );    // option 2
Robert Andrzejuk
  • 5,076
  • 2
  • 22
  • 31
  • Won't work with the braced initializer syntax since it's a non-const lvalue reference. Do `T const (&arr)[ N ]` – David G Apr 25 '18 at 16:18
  • 1
    `{1,2,3}` isn't an array, it doesn't even have a type so there is nothing to deduce – NathanOliver Apr 25 '18 at 16:18
  • 1
    Thanks for this solution, the added const @0x499602D2 is necessary to work with the r-value. The only issue I see now is that it is not possible to send an empty area or have a nullptr default argument. – Revillo Apr 25 '18 at 17:20
  • Create a second overloaded function but diffrent parameters. – Robert Andrzejuk Apr 25 '18 at 18:26
  • Added additonal function with no parameter. – Robert Andrzejuk Apr 25 '18 at 19:01
  • For efficiency you may want to have the top templated function call `func_real(arr, N)` because otherwise I suspect the compiler will build a new function for each different `N` used. – Zan Lynx Apr 25 '18 at 19:03
  • 1
    @ZanLynx For speed, that is exactly what is wanted. For different values of N, different code can be generated. – Robert Andrzejuk Apr 25 '18 at 19:27
  • 1
    I'm not an expert, but my understanding is that generating a lot of functions is not necessarily good for performance, as it can increase the size of your compiled code and make it harder to cache. Also, on the topic of overloading to avoid the empty list argument, this can be problematic when a function signature has multiple array parameters, one would need to 2^n overloads, and these may very well still be ambiguous to the user ie, if the lists are of the same type. – Revillo Apr 25 '18 at 20:40
  • 1
    @Revillo Performance will depend on the caching of Your code and data at the time of running. Not because of the amount of generated functions. – Robert Andrzejuk Apr 25 '18 at 20:48
  • @Revillo Added example of tag-dispatching for empty parameter. – Robert Andrzejuk Apr 25 '18 at 20:53
  • @Revillo Added function dispatcher. – Robert Andrzejuk Apr 25 '18 at 21:05
  • @RobertAndrzejuk: But the amount of generated functions affects caching. It's trivial to have all of the `void func( T const (&arr)[ N ] )` calls redirect to `func(&arr[0], N)`, which gets you speed, syntax, and caching. – Mooing Duck Apr 25 '18 at 21:37
  • @MooingDuck Caching is not affected by the amount of generated functions, but the amount of times the **functions are called**. That is why your improvement can help this situation. – Robert Andrzejuk Apr 26 '18 at 08:06
  • 1
    @RobertAndrzejuk There is also caching in the CPU for decoded instructions. Lots of functions doing mostly the same thing can bog down that cache. – Max Langhof Apr 26 '18 at 10:53
1

With C++17 it is easy enough to replace small arrays like this by variadic non-type templates. It might be worth considering something like:

template <std::size_t... Ns>
using A = std::index_sequence<Ns...>; // syntactic sugar

template <std::size_t... Ns>
constexpr void func(A<Ns...>) {
    (do_something(Ns), ...); // fold expression <=> for (int a: arr)
}
papagaga
  • 1,108
  • 8
  • 14
0

A common idiom in C++ is a span, sometimes called a slice, a ref, or a view, as in array_view, string_view, etc. Effectively, it's a bundling of the pointer and size arguments you used in the C-style array case, but in a C++ class that gives a std::array or std::vector interface. It's a non-owning reference to the data held in another contiguous container. Some implementations use spans only for const references but others do not.

The idiom gives you identical code to your std::vector case, but it's as efficient as the C-style array case because there's no dynamic memory allocation and copying.

The idiom has become so common, that it's part of the C++ Core Guidelines support library and a version may even make it into the C++ standard library.

If you don't have a library that already provides this idiom, it's not too hard to roll your own. You create a class template that has, as member, the pointer and the count of elements. You add constructors that make it easy to convert containers to this representation, and you overload operator[] to provide access.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175