11

I have a class A that has a std::vector<int> as attribute. A needs to fill this vector when an instance of A is created. The calculation can take some time and I would like to know if:

  1. it can be done at compile time.
  2. the vector can be sorted at compile time as well

I am not familiar with metaprogramming and I didn't find a way to do it for now. It's not an OS-specific question.

Here is the A.cpp file :

#include "A.h"
#define SIZEV 100

A::A()
{
    fillVector();
}

void A::fillVector()
{
    // m_vector is an attribute of class "A"
    // EXPECTATION 1 : fill the vector with the following calculation at compile time

    const int a=5;
    const int b=7;
    const int c=9;

    for(int i=0;i<SIZEV;i++){
        for(int j=0;j<SIZEV;j++){
            for(int k=0;k<SIZEV;k++){
                this->m_vector.push_back(a*i+b*j+c*k);
            }
        }
    }

    // EXPECTATION 2 : sort the vector as compile time 
}

std::vector<int> A::getVector() const
{
    return m_vector;
}

void A::setVector(const std::vector<int> &vector)
{
    m_vector = vector;
}

and the main.cpp (Qt app but doesn't matter):

#include <QCoreApplication>
#include "A.h"

int main(int argc, char *argv[])
{
    QCoreApplication app(argc, argv);

    A a;
    auto vec = a.getVector();

    // use vec, etc ...

    return app.exec();
}
Jinxed
  • 356
  • 2
  • 16
BlackPopa
  • 163
  • 1
  • 11
  • Do you mean at runtime instead of at "compile" time? – ventsyv Sep 18 '15 at 20:33
  • 3
    I mean at compile time. – BlackPopa Sep 18 '15 at 20:34
  • 1
    Have you considered code generation? – Karoly Horvath Sep 18 '15 at 20:34
  • If you run it once and print out this vector you can just have the vector created with `myvec = {1, 2, ...` – Fantastic Mr Fox Sep 18 '15 at 20:34
  • @Ben : it can be a vector with millions of values, I would like to avoid this method if possible – BlackPopa Sep 18 '15 at 20:35
  • @Karoly : I am looking for something based on code generation yes – BlackPopa Sep 18 '15 at 20:36
  • Well, that's what @Ben has just told you to do. – Karoly Horvath Sep 18 '15 at 20:37
  • @Karoly, I am looking for a "clean" way to do that. If you propose to copy paste a vector of millions of values in a file, that's not what I call clean :) – BlackPopa Sep 18 '15 at 20:40
  • @BlackPopa If you don't want to calculate your values at runtime then you need to either embed the pre-calculated data in your executable or load it from an external file or other resource. Having your build system generate the data as a .cpp file is a more automated method than copy and paste but is the same basic principle. Given the size of the data you're talking about however, it may be cheaper to calculate it on startup than to load it from disk (disks, even SSDs, are much slower than memory and you can do a lot of calculations in the time it takes to load from disk). – mattnewport Sep 18 '15 at 20:45
  • 2
    Are you sure that the calculation takes the biggest amount of time? For me it is rather calling push_back. Have you tried defining m_vector with initial size and then setting values in your loops? Or even using an array? – Estiny Sep 18 '15 at 20:53
  • @Estiny : thats a fair point. I will have to do some testing and benchmark some ideas you guys gave me ! – BlackPopa Sep 18 '15 at 20:58
  • 1
    @BlackPopa: who said anything about "copy-pasting"? – Karoly Horvath Sep 19 '15 at 09:49
  • @Karoly Horvath : yes misurderstood ;) – BlackPopa Sep 19 '15 at 11:48
  • Your data is highly redundant. You can represent the same information in a small but quick-to-access form with counts of how many elements have a given value, and/or prefix sums of that. Regardless of using that to eliminate m_vector entirely, using that idea for CountingSort at startup will be *much* faster. See my answer. (Sorry for leaving comments all over the place, but people (like @TemplateRex) don't get notified when I edit my answer, and I think this is a valuable new idea nobody's suggested yet.) – Peter Cordes Sep 20 '15 at 06:34

7 Answers7

16

A std::vector<int> does not have any constexpr constructors (because dynamic memory allocation is disallowed for constexpr). So you can't sort a std::vector<int> at compile-time.

You can create a std::array<int, N> at compile-time for a constant N, but you'd have to write your own sorting routine because std::sort is not constexpr either.

You can also write a Boost.MPL compile-time vector or list and use the sort routine of that. But this won't scale as well as std::array.

Another angle of attack might be to store the vector into a static variable and do the sorting at program initialization. Your program just takes a bit longer to start, but it won't affect the rest of its main functionality.

Since sorting is O(N log N), you might even have a two-step build and write the sorted vector to a file, and either compile/link it to your main program, or load it in O(N) at program startup into a static variable.

TemplateRex
  • 69,038
  • 19
  • 164
  • 304
10

This is a simple compile time sort for ints. It works by for each element, working out its position within the list. From this it works out what should be at each position. Then it builds a new list inserted at the appropriate positions. It's probably not as efficient in terms of complexity (it's O(n^2)) as the previous solution but it's much easier to understand and it doesn't use recursion.

#include <initializer_list>
#include <array>
#include <tuple>

template<int... members>
struct IntList
{
    constexpr bool operator==(IntList) const { return true; }

    template<int... others>
    constexpr bool operator==(IntList<others...>) const { return false; }

    template<int idx>
    static constexpr auto at() 
    {
        return std::get<idx>(std::make_tuple(members...));
    }

    template<int x>
    static constexpr auto indexOf()
    {
        int sum {};
        auto _ = { 0, (x > members ? ++sum : 0)... };
        return sum;
    }

    template<int x>
    static constexpr auto count()
    {
        int sum {};
        auto _ = { 0, (x == members ? ++sum : 0)... };
        return sum;
    }

    template<int i>
    static constexpr auto ith()
    {
        int result{};
        auto _ = {
            ( i >= indexOf<members>() && i < indexOf<members>() + count<members>() ? 
              result = members : 0 )...
        };
        return result;
    }

    template<std::size_t... i>
    static constexpr auto sortImpl(std::index_sequence<i...>)
    {
        return IntList< ith<i>()... >();
    }

    static constexpr auto sort() 
    {
        return sortImpl(std::make_index_sequence<sizeof...(members)>());
    }
};

static_assert(IntList<1, 2, 3>().at<1>() == 2, "");

static_assert(IntList<>().indexOf<1>()           == 0, "");
static_assert(IntList<1>().indexOf<1>()          == 0, "");
static_assert(IntList<1, 2, 3, 4>().indexOf<3>() == 2, "");

static_assert(IntList<>().count<1>()        == 0, "");
static_assert(IntList<1>().count<1>()       == 1, "");
static_assert(IntList<1, 1>().count<1>()    == 2, "");
static_assert(IntList<2, 2, 1>().count<1>() == 1, "");
static_assert(IntList<1, 2, 1>().count<1>() == 2, "");

static_assert(IntList<>().sort()        == IntList<>(),        "");
static_assert(IntList<1>().sort()       == IntList<1>(),       "");
static_assert(IntList<1, 2>().sort()    == IntList<1, 2>(),    "");
static_assert(IntList<2, 1>().sort()    == IntList<1, 2>(),    "");
static_assert(IntList<3, 2, 1>().sort() == IntList<1, 2, 3>(), "");
static_assert(IntList<2, 2, 1>().sort() == IntList<1, 2, 2>(), "");
static_assert(IntList<4, 7, 2, 5, 1>().sort() == IntList<1, 2, 4, 5, 7>(), "");
static_assert(IntList<4, 7, 7, 5, 1, 1>().sort() == IntList<1, 1, 4, 5, 7, 7>(), "");
Ab Wilson
  • 337
  • 2
  • 4
5

The classical approach for lengthy calculations that can be precomputed is to calculate the result as a part of the build process, generating a .cpp that hardcodes the result (on platforms that have embedded resources these can be used as well). .

However, here the calculation is extremely simple, the slow part is probably just the allocation, which, if you want to keep the data in an std::vector, has to happen at runtime. If you can live with a C-style array you could put it all in the executable as described above, but that would produce an executable 4 MBs larger, and the slowdown caused to load it from disk would offset any speed benefit of the precalculation.

IOW: precalculating at build time makes sense when the computation is expensive and the output is small. Your case is at the exact opposite of the spectrum, so I would avoid it.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • Thats interesting. However, I think the solution would be a lot faster even with the slowdown from disk. For example with #define SIZEV 750 it already takes 3 seconds on my computer to fill the vector. – BlackPopa Sep 18 '15 at 20:48
  • 2
    @BlackPopa: even worse, that would make a 1,6 GB executable! – Matteo Italia Sep 18 '15 at 20:53
  • 3
    @BlackPopa is the calculation you have in your question the actual calculation you are doing? Probably you will find it goes a lot faster if you `reserve()` the amount of space you need (`SIZEV*SIZEV*SIZEV` ) up front. – mattnewport Sep 18 '15 at 20:53
  • 3
    @BlackPopa: in general, keep in mind that if your code has good asymptotic complexity (O(n), O(n log n)) and is memory-bound (as opposed to CPU-bound) usually reading from disk cannot beat computation from scratch, as it's O(n) with a way bigger constant (the transfer rate of RAM is about two order of magnitude better than disk). – Matteo Italia Sep 18 '15 at 21:03
  • 2
    @BlackPopa I don't know what you did but `reserve()` isn't an alternative to `push_back()` - you want to call `m_vector.reserve(SIZEV*SIZEV*SIZEV)` before your loop where you `push_back()`. Nothing else changes, you still `push_back()`. That will be faster, if it is not you are measuring wrong. – mattnewport Sep 18 '15 at 21:24
  • @mattnewport : sorry, you are right, the push_back is needed and indeed it's a lot faster ! – BlackPopa Sep 18 '15 at 21:31
  • 2
    @BlackPopa don't do that, that's undefined behaviour. `vector::reserve()` just pre-allocates enough space for the number of elements requested, it doesn't change the size of the vector. If you use `operator[]` to access those elements your vector's `size()` will still be 0. – mattnewport Sep 18 '15 at 21:31
  • @mattnewport : yes :) that's a very useful tool, didn't know it would speed up that much – BlackPopa Sep 18 '15 at 21:33
  • 3
    In all my tests with `std::vector` of scalar types it's faster to do `resize` instead of `reserve`, and then use `operator[]` to store the elements, as the zero-initialization for the elements is very cheap and you avoid all the size/capacity checks done by the `push_back`. – Matteo Italia Sep 18 '15 at 22:47
  • @MatteoItalia: Did you test vectors too large to fit in the CPU cache? Zeroing some memory and then writing it again is fast if it's still in cache. Also, a single-threaded microbenchmark won't show much penalty for using main-memory bandwidth, since you have to read or write about 8B / clock to saturate main memory with a single core (e.g. 3.3GHz CPU, 26GB/s bandwidth). If one thread of your code isn't the only thing running, memory bandwidth is a more precious resource. – Peter Cordes Sep 19 '15 at 03:19
  • What might be good is doing `resize` in chunks of 8kiB or 16kiB (under half of L1 D cache size), if you can't get `push_back` in a loop to compile to code with the checks and increments hoisted out. – Peter Cordes Sep 19 '15 at 03:21
  • 1
    @mattnewport and MatteoItalia: http://en.cppreference.com/w/cpp/container/vector/resize says you can avoid the zeroing by using a custom `Allocator::construct`, with a link to http://stackoverflow.com/a/21028912/273767. Also, I updated my answer with a CountingSort suggestion that should speed up the sort-at-runtime startup a lot for this case. – Peter Cordes Sep 20 '15 at 06:27
5

The data is integers from 0 to SIZEV * (a+b+c), but the number of integers is SIZEV3. It's a dense group of integers with a small range, so CountingSort is perfect (and you never need to build the unsorted array, just increment counts while generating).

Regardless of keeping around the counts / prefix sums, CountingSort is absolutely going to be a big win in startup time to sort the vector, vs. other sorts, keeping everything else the same.

You can keep a compact form (O(cuberoot(n)) size) of your data as a vector of prefix sums, for lookups from m_vector in O(log (cuberoot(n))) time (binary search the prefix sums), where n is the length of m_vector. See below.

Depending on cache / memory latency, not actually expanding m_vector might or might not be a performance win. If a range of values is needed, you can very quickly generate sequential elements of m_vector on the fly, from the prefix sums.

class A {
    // vector<uint16_t> m_counts;  // needs to be 32b for SIZEV>=794 (found experimentally).

    vector<uint32_t> m_pos;     // values are huge: indices into m_vector, up to SIZEV**3 - 1
    vector<uint16_t> m_vector;  // can be 16b until SIZEV>3121: max val is only (a+b+c) * (SIZEV-1)
}
void A::fillVector()
{
    const int a=5;
    const int b=7;
    const int c=9;

    const auto max_val = (SIZEV-1) * (a+b+c);

    m_vector.reserve(SIZEV*SIZEV*SIZEV);
    m_vector.resize(0);
    // or clear it, but that writes tons of mem, unless you use a custom Allocator::construct to leave it uninit
    // http://en.cppreference.com/w/cpp/container/vector/resize

    m_pos.resize(max_val + 1);  // again, ideally avoid zeroing
                  // but if not, do it before m_counts

    m_counts.clear();  // do this one last, so it's hot in cache even if others wasted time writing zeros.
    m_counts.resize(max_val + 1); // vector is now zeroed
    // Optimization: don't have a separate m_counts.
    // zero and count into m_pos, then do prefix summing in-place


    // manually strength-reduce the multiplication to addition
    // in case the compiler decides it won't, or can't prove it won't overflow the same way
    // Not necessary with gcc or clang: they both do this already
    for(int kc=c*(SIZEV-1) ; kc >= 0 ; kc-=c) {
      for(int jb=b*(SIZEV-1) ; jb >= 0 ; jb-=b) {
        for(int ia=a*(SIZEV-1) ; ia >= 0 ; ia-=a) {
          m_counts[kc + jb + ia]++;
          // do the smallest stride in the inner-most loop, for better cache locality
        }
      }
    }
// write the early elements last, so they'll be hot in the cache when we're done


    int val = 0;
    uint32_t sum = 0;
    for ( auto &count : m_counts ) {
       m_vector.insert(m_vector.end(), count, val++);
       // count is allowed to be zero for vector::insert(pos, count, value)
       m_pos[val] = sum;   // build our vector of prefix sums
       sum += count;

       //count = (sum+=count);  // in-place conversion to prefix sums
    }
    assert(m_vector.size() == SIZEV*SIZEV*SIZEV);
}

Or, instead of actually expanding a 1.6GB array, make Prefix sums of the counts, giving you a vector of the start position of the run of that index as an element in m_vector. i.e. idx = m_pos[val]; m_vector[idx] == val. (This breaks down for val <= 13, where there are values that can't be represented as a sum of a, b, and c, so there are zeros in m_count, and repeats in m_pos)

Anyway, you can replace a read of m_vector[i] with a binary-search for i in m_pos. You're looking for the highest index in m_pos that has value <= i. That index is what you'd find at m_vector[i]. (Or something like that; I may have an off-by-one error.)

A hash table won't work, because you need to map multiple values of i to each number from 0..(750*(a+b+c)). (All the is where m_vector[i] has the same value.)

If you need a run of sequential elements, generate them on the fly into a tmp buffer. Look at m_pos[i+1] to see when the next element with a different value is coming. (Looking at m_counts might save some subtraction, but you're probably better off just taking differences in m_pos to invert the prefix sums, to avoid cache misses / cache pollution from touching a 2nd array.)

Actually, m_counts probably doesn't need to be kept around as a class member at all, just a temporary in FillVector. Or FillVector can count into m_pos, and convert it in-place into prefix sums.

Ideally there's something clever you can do with templates to choose a types that are wide enough, but no wider than needed, for m_counts and m_vector. IDK number theory, so I don't know how to prove that there won't be one bucket of m_counts that overflows a uint16_t. The average count will be 750**3 / (750*(5+7+9)) = 26786, and they're certainly clustered towards the high end of m_counts. In practice, SIZEV=793 can use uint16_t counters, while SIZEV=794 produces several counts > 65536 (Thanks to Chris for the working example where I could easily test this).

m_vector can be uint16_t until (SIZEV-1)*(a+b+c) > MAX_UINT16 (65535). i.e. until SIZEV >= 3122, at which point m_vector takes 28.3 GiB of RAM.


At SIZEV = 750, m_pos is about 2x L1 cache size (Intel CPU) (750*(5+7+9) * 4B per short = 63000B). If the compiler does a good job and makes a binary search with conditional-move instead of unpredictable branch instructions, this could be quite fast. It will certainly save you a lot of main-memory traffic, which is valuable if you have multiple threads.

Alternatively, never touching m_vector means you can handle problem sizes which would require more memory than you have to store the list.

If you want to get really creative with optimizing for cache when creating m_counts in the first place (with the triple-nested loop), have the innermost loop go forwards and then back, instead of the same direction both times. This will only matter for extremely large SIZEV, or if the other hyperthread is putting a lot of pressure on the cache.

  for(int kc=c*(SIZEV-1) ; kc >= 0 ; kc-=c) {
    for(int jb=b*(SIZEV-1) ; jb >= 0 ; jb-=b) {

      for(int ia=0 ; ia<SIZEV*a ; ia+=a)
        counts[kc + jb + ia]++;
      if (! (jb-=b )) break;
      for(int ia=a*(SIZEV-1) ; ia >= 0 ; ia-=a)
        counts[kc + jb + ia]++;

    }
  }

Counting down towards zero (with or without the bidirectional inner loops) is very likely a small win for the beginning of the next loop, before it becomes memory-bound doing big memsets when counts get high. Also a win for scanning through forwards to do prefix sums in place.


my previous answer, which is probably a dead end:

Is there any hope of finding a closed-form formula for the ith element in the sorted vector? Or even an O(log i) algorithm for generating it on the fly?

Unless you need a lot of sequential elements from this vector when you access it, it might be faster to compute it on the fly. Memory is slow, CPU is fast, so if you can compute a[i] in under ~150 clock cycles, you come out ahead. (Assuming every access is a cache miss, or that not touching all that vector memory reduces cache misses in the rest of your program).

If we can do this, we could in theory write the sorted array in order in the first place.

To do that: shuffle the constants so a <= b <= c.

0, a, [a*2 .. a*int(b/a)], b, [b + a .. b + a*int((c-b)/a) mixed with b*2 .. b*int(c/b)], c, [some number of b*x + a*y], c+a, [more b*x + a*y], ...

Ok, so this is turning into a combinatorial mess, and this idea probably isn't viable. At least, not for the general case of any a, b, and c.

With a=5, b=7, c=9:

0, 5=a, 7=b, 9=c, 10=2a, 12=b+a, 14=2b, 14=c+a, 15=3a, 16=c+b, 18=2c

I'm too sleepy to see a pattern, but here's a longer list

# bash
limit=5; for ((i=0 ; i<limit ; i++)); do
             for ((j=0 ; j<limit ; j++)); do 
               for ((k=0 ; k<limit ; k++)); do 
                 printf "%2d: %d %d %d\n" $((5*i + 7*j + 9*k)) $i $j $k; 
           done; done; done | sort -n | cat -n
     1   0: 0 0 0
     2   5: 1 0 0
     3   7: 0 1 0
     4   9: 0 0 1
     5  10: 2 0 0
     6  12: 1 1 0
     7  14: 0 2 0
     8  14: 1 0 1
     9  15: 3 0 0
    10  16: 0 1 1
    11  17: 2 1 0
    12  18: 0 0 2
    13  19: 1 2 0
    14  19: 2 0 1
    15  20: 4 0 0
    16  21: 0 3 0
    17  21: 1 1 1
    18  22: 3 1 0
    19  23: 0 2 1
    20  23: 1 0 2
    21  24: 2 2 0
    22  24: 3 0 1
    23  25: 0 1 2
    24  26: 1 3 0
    25  26: 2 1 1
    26  27: 0 0 3
    27  27: 4 1 0
    28  28: 0 4 0
    29  28: 1 2 1
    30  28: 2 0 2
    31  29: 3 2 0
    32  29: 4 0 1
    33  30: 0 3 1
    34  30: 1 1 2
    35  31: 2 3 0
    36  31: 3 1 1
    37  32: 0 2 2
    38  32: 1 0 3
    39  33: 1 4 0
    40  33: 2 2 1
    41  33: 3 0 2
    42  34: 0 1 3
    43  34: 4 2 0
    44  35: 1 3 1
    45  35: 2 1 2
    46  36: 0 0 4
    47  36: 3 3 0
    48  36: 4 1 1
    49  37: 0 4 1
    50  37: 1 2 2
    51  37: 2 0 3
    52  38: 2 4 0
    53  38: 3 2 1
    54  38: 4 0 2
    55  39: 0 3 2
    56  39: 1 1 3
    57  40: 2 3 1
    58  40: 3 1 2
    59  41: 0 2 3
    60  41: 1 0 4
    61  41: 4 3 0
    62  42: 1 4 1
    63  42: 2 2 2
    64  42: 3 0 3
    65  43: 0 1 4
    66  43: 3 4 0
    67  43: 4 2 1
    68  44: 1 3 2
    69  44: 2 1 3
    70  45: 3 3 1
    71  45: 4 1 2
    72  46: 0 4 2
    73  46: 1 2 3
    74  46: 2 0 4
    75  47: 2 4 1
    76  47: 3 2 2
    77  47: 4 0 3
    78  48: 0 3 3
    79  48: 1 1 4
    80  48: 4 4 0
    81  49: 2 3 2
    82  49: 3 1 3
    83  50: 0 2 4
    84  50: 4 3 1
    85  51: 1 4 2
    86  51: 2 2 3
    87  51: 3 0 4
    88  52: 3 4 1
    89  52: 4 2 2
    90  53: 1 3 3
    91  53: 2 1 4
    92  54: 3 3 2
    93  54: 4 1 3
    94  55: 0 4 3
    95  55: 1 2 4
    96  56: 2 4 2
    97  56: 3 2 3
    98  56: 4 0 4
    99  57: 0 3 4
   100  57: 4 4 1
   101  58: 2 3 3
   102  58: 3 1 4
   103  59: 4 3 2
   104  60: 1 4 3
   105  60: 2 2 4
   106  61: 3 4 2
   107  61: 4 2 3
   108  62: 1 3 4
   109  63: 3 3 3
   110  63: 4 1 4
   111  64: 0 4 4
   112  65: 2 4 3
   113  65: 3 2 4
   114  66: 4 4 2
   115  67: 2 3 4
   116  68: 4 3 3
   117  69: 1 4 4
   118  70: 3 4 3
   119  70: 4 2 4
   120  72: 3 3 4
   121  74: 2 4 4
   122  75: 4 4 3
   123  77: 4 3 4
   124  79: 3 4 4
   125  84: 4 4 4
Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I didn't get notified of your new answer, I will read it carefuly, the approach looks very interesting. :) – BlackPopa Sep 20 '15 at 09:46
  • 2
    @BlackPopa I was preparing an answer using Counting Sort but Peter beat me to it. I can confirm it is much quicker. In case you are interested [here is my benchmark](http://melpon.org/wandbox/permlink/MZipISIGAKzxxk2m). – Chris Drew Sep 20 '15 at 18:28
  • @Chris Drew : I had a look. thank you very much, the difference is huge indeed ! – BlackPopa Sep 20 '15 at 18:35
  • @ChrisDrew: Thanks for the code link. It was handy to have working code to check when uint16_t overflowed. I linked to a modified version of that code in my answer. We both had off-by-one errors in our counting of max-value and sizes of arrays needed. I think I have it correct now. I noticed you didn't invert the loops, doing the smaller stride in the inner loop for better cache locality. Also, I had another idea since my original answer: do triple-loops counting down to zero, so the front of `counts` is hot in the cache when you go to compute prefix sums (or expand it into `m_vector`.) – Peter Cordes Sep 20 '15 at 21:48
4

While this can be done (live example) you should not do it. It will take way to much compile time.

The compiler is not designed for fast, efficient numeric mass processing. For now, restrict your compile-time work to relatively simple things, not sorting 10 million elements.

Even if you write "compliant" code, most compilers today will explode on you. The code I wrote dies pretty early, even though I attempted to be careful with my recursion depth limit.

Anyhow, for posterity:

template<class T>struct tag{using type=T;};
template<class Tag>using type_t=typename Tag::type;

template<int...Xs> struct values { constexpr values() {}; };

template<int...Xs> constexpr values<Xs...> values_v = {};

template<class...Vs> struct append;
template<class...Vs> using append_t=type_t<append<Vs...>>;
template<class...Vs> constexpr append_t<Vs...> append_v = {};

template<> struct append<>:tag<values<>>{};
template<int...Xs>struct append<values<Xs...>>:tag<values<Xs...>>{};
template<int...Lhs, int...Rhs, class...Vs>
struct append<values<Lhs...>,values<Rhs...>,Vs...>:
    tag<append_t<values<Lhs...,Rhs...>,Vs...>>
{};

template<int...Lhs>
constexpr values<Lhs...> simple_merge( values<Lhs...>, values<> ) { return {}; }
template<int...Rhs>
constexpr values<Rhs...> simple_merge( values<>, values<Rhs...> ) { return {}; }
constexpr values<> simple_merge( values<>, values<> ) { return {}; }

template<int L0, int...Lhs, int R0, int...Rhs>
constexpr auto simple_merge( values<L0, Lhs...>, values<R0, Rhs...> )
-> std::conditional_t<
    (R0 < L0),
    append_t< values<R0>, decltype( simple_merge( values<L0,Lhs...>{}, values<Rhs...>{} ) ) >,
    append_t< values<L0>, decltype( simple_merge( values<Lhs...>{}, values<R0, Rhs...>{} ) ) >
> {
    return {};
}

template<class Lhs, class Rhs>
using simple_merge_t = decltype( simple_merge( Lhs{}, Rhs{} ) );
template<class Lhs, class Rhs>
constexpr simple_merge_t<Lhs, Rhs> simple_merge_v = {};

template<class Values, size_t I> struct split
{
private:
    using one = split<Values, I/2>;
    using two = split<typename one::rhs, I-I/2>;
public:
    using lhs = append_t< typename one::lhs, typename two::lhs >;
    using rhs = typename two::rhs;
};
template<class Values, size_t I> using split_t=type_t<split<Values, I>>;

template<class Values> struct split<Values, 0>{
    using lhs = values<>;
    using rhs = Values;
};
template<int X0, int...Xs> struct split<values<X0, Xs...>, 1> {
    using lhs = values<X0>;
    using rhs = values<Xs...>;
};
template<class Values, size_t I> using before_t = typename split<Values, I>::lhs;
template<class Values, size_t I> using after_t = typename split<Values, I>::rhs;

template<size_t I>using index_t=std::integral_constant<size_t, I>;
template<int I>using int_t=std::integral_constant<int, I>;
template<int I>constexpr int_t<I> int_v={};

template<class Values> struct front;
template<int X0, int...Xs> struct front<values<X0, Xs...>>:tag<int_t<X0>>{};
template<class Values> using front_t=type_t<front<Values>>;
template<class Values> constexpr front_t<Values> front_v = {};

template<class Values, size_t I>
struct get:tag<front_t< after_t<Values, I> >> {};
template<class Values, size_t I> using get_t = type_t<get<Values, I>>;
template<class Values, size_t I> constexpr get_t<Values, I> get_v = {};

template<class Values>
struct length;
template<int...Xs>
struct length<values<Xs...>>:tag<index_t<sizeof...(Xs)>> {};
template<class Values> using length_t = type_t<length<Values>>;
template<class Values> constexpr length_t<Values> length_v = {};

template<class Values> using front_half_t = before_t< Values, length_v<Values>/2 >;
template<class Values> constexpr front_half_t<Values> front_half_v = {};
template<class Values> using back_half_t = after_t< Values, length_v<Values>/2 >;
template<class Values> constexpr back_half_t<Values> back_half_v = {};

template<class Lhs, class Rhs>
struct least : tag< std::conditional_t< (Lhs{}<Rhs{}), Lhs, Rhs > > {};
template<class Lhs, class Rhs> using least_t = type_t<least<Lhs, Rhs>>;
template<class Lhs, class Rhs>
struct most : tag< std::conditional_t< (Lhs{}>Rhs{}), Lhs, Rhs > > {};
template<class Lhs, class Rhs> using most_t = type_t<most<Lhs, Rhs>>;

template<class Values>
struct pivot {
private:
    using a = get_t<Values, 0>;
    using b = get_t<Values, length_v<Values>/2>;
    using c = get_t<Values, length_v<Values>-1>;
    using d = most_t< least_t<a,b>, most_t< least_t<b,c>, least_t<a,c> > >;
public:
    using type = d;
};
template<int X0, int X1>
struct pivot<values<X0, X1>>: tag< most_t< int_t<X0>, int_t<X1> > > {};

template<class Values> using pivot_t = type_t<pivot<Values>>;
template<class Values> constexpr pivot_t<Values> pivot_v = {};

template<int P>
constexpr values<> lower_split( int_t<P>, values<> ) { return {}; }
template<int P, int X0>
constexpr std::conditional_t< (X0<P), values<X0>, values<> > lower_split( int_t<P>, values<X0> ) { return {}; }

template<int P, int X0, int X1, int...Xs >
constexpr auto lower_split( int_t<P>, values<X0, X1, Xs...> )
-> append_t<
    decltype(lower_split( int_v<P>, front_half_v<values<X0, X1, Xs...>> )),
    decltype(lower_split( int_v<P>, back_half_v<values<X0, X1, Xs...>> ))
>{ return {}; }
template<int P>
constexpr values<> upper_split( int_t<P>, values<> ) { return {}; }
template<int P, int X0>
constexpr std::conditional_t< (X0>P), values<X0>, values<> > upper_split( int_t<P>, values<X0> ) { return {}; }
template<int P, int X0, int X1, int...Xs>
constexpr auto upper_split( int_t<P>, values<X0, X1, Xs...> )
-> append_t<
    decltype(upper_split( int_v<P>, front_half_v<values<X0, X1, Xs...>> )),
    decltype(upper_split( int_v<P>, back_half_v<values<X0, X1, Xs...>> ))
>{ return {}; }

template<int P>
constexpr values<> middle_split( int_t<P>, values<> ) { return {}; }
template<int P, int X0>
constexpr std::conditional_t< (X0==P), values<X0>, values<> > middle_split( int_t<P>, values<X0> ) { return {}; }
template<int P, int X0, int X1, int...Xs>
constexpr auto middle_split( int_t<P>, values<X0, X1, Xs...> )
-> append_t<
    decltype(middle_split( int_v<P>, front_half_v<values<X0, X1, Xs...>> )),
    decltype(middle_split( int_v<P>, back_half_v<values<X0, X1, Xs...>> ))
>{ return {}; }

template<class Values>
using lower_split_t = decltype(lower_split( pivot_v<Values>, Values{} ) );
template<class Values> constexpr lower_split_t<Values> lower_split_v = {};
template<class Values>
using upper_split_t = decltype(upper_split( pivot_v<Values>, Values{} ) );
template<class Values> constexpr upper_split_t<Values> upper_split_v = {};
template<class Values>
using middle_split_t = decltype(middle_split( pivot_v<Values>, Values{} ) );
template<class Values> constexpr middle_split_t<Values> middle_split_v = {};

constexpr values<> simple_merge_sort( values<> ) { return {}; }
template<int X>
constexpr values<X> simple_merge_sort( values<X> ) { return {}; }

template<class Values>
using simple_merge_sort_t = decltype( simple_merge_sort( Values{} ) );
template<class Values>
constexpr simple_merge_sort_t<Values> simple_merge_sort_v = {};

template<int X0, int X1, int...Xs>
constexpr auto simple_merge_sort( values<X0, X1, Xs...> )
-> 
simple_merge_t<
    simple_merge_t<
        simple_merge_sort_t<lower_split_t<values<X0, X1, Xs...>>>, simple_merge_sort_t<upper_split_t<values<X0, X1, Xs...>>>
    >,
    middle_split_t<values<X0, X1, Xs...>>
>
{ return {}; }


template<class Values>constexpr Values cross_add( Values ) { return {}; }
template<class Values>constexpr values<> cross_add( values<>, Values ) { return {}; }
template<int A0, int...B>constexpr values<(B+A0)...> cross_add( values<A0>, values<B...> ) { return {}; }

template<int A0, int A1, int...A, int...B>
constexpr auto cross_add( values<A0, A1, A...>, values<B...>)
-> append_t<
    decltype(cross_add( front_half_v<values<A0, A1, A...>>, values_v<B...> ) ),
    decltype(cross_add( back_half_v<values<A0, A1, A...>>, values_v<B...> ) )
> { return {}; }

template<class V0, class V1, class V2, class... Vs>
constexpr auto cross_add( V0, V1, V2, Vs... )
-> decltype(
    cross_add( cross_add( V0{}, V1{} ), V2{}, Vs{}... )
) { return {}; }

template<class...Vs>
using cross_add_t = decltype( cross_add(Vs{}...) );
template<class...Vs>
constexpr cross_add_t<Vs...> cross_add_v = {};

template<int X, int...Xs>
constexpr values<(X*Xs)...> scale( int_t<X>, values<Xs...> ) { return {}; }
template<class X, class Xs>
using scale_t = decltype( scale(X{}, Xs{}) );
template<class X, class Xs> constexpr scale_t<X,Xs> scale_v = {};

template<int X0, int...Xs> struct generate_values : generate_values<X0-1, X0-1, Xs...> {};
template<int...Xs> struct generate_values<0,Xs...>:tag<values<Xs...>>{};
template<int X0> using generate_values_t = type_t<generate_values<X0>>;

a median-of-three compile-time merge sort and cross product generator. I could, with effort, probably massively reduce the line count.

Probably doing this with a constexpr std::array would be faster than the above pure-type solution.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • I don't have much experience with template metaprogramming. Would templates do better with a https://en.wikipedia.org/wiki/Counting_sort? See my answer for a run-time CountingSort (which should be *way* faster than std::sort for dense highly-duplicated data like this). – Peter Cordes Sep 20 '15 at 06:23
2

Not exactly what you are looking for perhaps, but you could write a separate program to calculate the vector, sort it, and then output it to a list. Then you could just read in that file.

If reading from disk is too slow, you could also massage the output into a legal C++ file that initializes an instance of your class possessing hardcoded values meeting your requirements. This could then be linked back to your main project and compiled, essentially giving the same functionality of the much more complicated metaprogramming task you're outlining here.

AGML
  • 890
  • 6
  • 18
  • 3
    Even if you embed the precalculated array in the executable, it still has to be read from disk, so the performance considerations are unaffected by this choice. – Matteo Italia Sep 18 '15 at 23:23
  • That's true, I suppose. Would this be any different from embedding the array in the executable via template metaprogramming though? – AGML Sep 20 '15 at 00:00
  • 1
    @AGML: nope, not at all. Your way also has the huge benefit of only needing to rebuild the massive object file when the needed data changes, not for every change in the class that makes the compiler sort 1.6GB of data at compile time. However, the general consensus (which I agree with) is that having the app do this at runtime will be better for the app's performance, even leaving aside the issue of distributing a 1.6GB executable. CountingSort is fast. – Peter Cordes Sep 20 '15 at 04:32
1

You can implement skew heap to sort ints in compile time. The following example is work for c++17.

#include <type_traits>
#include <utility>

template <class T, T... s>
using iseq = std::integer_sequence<T, s...>;

template <class T, T v>
using ic = std::integral_constant<T, v>;

template <class T, T v1, T v2>
constexpr auto ic_less_impl(ic<T, v1>, ic<T, v2>) -> ic<bool, v1 < v2>;
template <class ic1, class ic2>
using ic_less = decltype(ic_less_impl(ic1(), ic2()));

template <bool b>
using bool_cond_t = std::conditional_t<b, std::true_type, std::false_type>;

struct nil {};

template <class T, T v, T... s>
constexpr auto iseq_front_impl(iseq<T, v, s...>) -> ic<T, v>;
template <class T>
constexpr auto iseq_front_impl(iseq<T>) -> nil;
template <class seq>
using iseq_front = decltype(iseq_front_impl(seq()));

template <class T, T v, T... s>
constexpr auto iseq_pop_front_impl(iseq<T, v, s...>) -> iseq<T, s...>;
template <class seq>
using iseq_pop_front = decltype(iseq_pop_front_impl(seq()));

template <class T, T v, T... s>
constexpr auto iseq_append_impl(iseq<T, s...>, ic<T, v>) -> iseq<T, s..., v>;
template <class T, T v>
constexpr auto iseq_append_impl(nil, ic<T, v>) -> iseq<T, v>;
template <class seq, class c>
using iseq_append = decltype(iseq_append_impl(seq(), c()));

template <class seq>
using iseq_is_empty = bool_cond_t<std::is_same<iseq_front<seq>, nil>::value>;

template <class X, class L, class R>
struct skew_heap {};

template <class X, class L, class R>
constexpr auto skh_get_top_impl(skew_heap<X, L, R>) -> X;
template <class H>
using skh_get_top = decltype(skh_get_top_impl(H()));

template <class X, class L, class R>
constexpr auto skh_get_left_impl(skew_heap<X, L, R>) -> L;
template <class H>
using skh_get_left = decltype(skh_get_left_impl(H()));

template <class X, class L, class R>
constexpr auto skh_get_right_impl(skew_heap<X, L, R>) -> R;
template <class H>
using skh_get_right = decltype(skh_get_right_impl(H()));

template <class H>
using skh_is_empty = bool_cond_t<std::is_same<H, nil>::value>;

template <class H1, class H2>
constexpr auto skh_merge_impl(H1, H2) -> decltype(auto) {
    if constexpr (skh_is_empty<H1>::value) {
        return H2{};
    } else if constexpr (skh_is_empty<H2>::value) {
        return H1{};
    } else {
        using x1 = skh_get_top<H1>;
        using l1 = skh_get_left<H1>;
        using r1 = skh_get_right<H1>;

        using x2 = skh_get_top<H2>;
        using l2 = skh_get_left<H2>;
        using r2 = skh_get_right<H2>;

        if constexpr (ic_less<x2, x1>::value) {
            using new_r2 = decltype(skh_merge_impl(H1(), r2()));
            return skew_heap<x2, new_r2, l2> {};
        } else {
            using new_r1 = decltype(skh_merge_impl(r1(), H2()));
            return skew_heap<x1, new_r1, l1>{};
        }
    }
}
template <class H1, class H2>
using skh_merge = decltype(skh_merge_impl(H1(), H2()));

template <class H1, class IC1>
using skh_push = skh_merge<H1, skew_heap<IC1, nil, nil>>;

template <class H>
using skh_pop = skh_merge<skh_get_left<H>, skh_get_right<H>>;

template <class H, class seq>
constexpr auto skh_heapify_impl(H, seq) -> decltype(auto) {
    if constexpr (iseq_is_empty<seq>::value) {
        return H{};
    } else {
        using val = iseq_front<seq>;
        return skh_heapify_impl(skh_push<H, val>{}, iseq_pop_front<seq>{});
    }
}
template <class seq>
using skh_heapify = decltype(skh_heapify_impl(nil(), seq()));

template <class H, class seq>
constexpr auto skh_to_sortseq_impl(H, seq) -> decltype(auto) {
    if constexpr (skh_is_empty<H>::value) {
        return seq{};
    } else {
        using val = skh_get_top<H>;
        return skh_to_sortseq_impl(skh_pop<H>{}, iseq_append<seq, val>{});
    }
}
template <class H>
using skh_to_sortseq = decltype(skh_to_sortseq_impl(H(), nil()));

template <class seq>
using sort_seq = skh_to_sortseq<skh_heapify<seq>>;

static_assert(std::is_same<iseq<int, 1, 2, 3, 4, 5, 6, 7, 8, 9>, sort_seq<iseq<int, 2, 3, 5, 8, 9, 6, 7, 1, 4>>>::value);
firejox
  • 9
  • 3