24

I would like to populate an array of enum using constexpr. The content of the array follows a certain pattern.

I have an enum separating ASCII character set into four categories.

enum Type {
    Alphabet,
    Number,
    Symbol,
    Other,
};

constexpr Type table[128] = /* blah blah */;

I would like to have an array of 128 Type. They can be in a structure. The index of the array will be corresponding to the ASCII characters and the value will be the Type of each character.

So I can query this array to find out which category an ASCII character belongs to. Something like

char c = RandomFunction();
if (table[c] == Alphabet) 
    DoSomething();

I would like to know if this is possible without some lengthy macro hacks.

Currently, I initialize the table by doing the following.

constexpr bool IsAlphabet (char c) {
    return ((c >= 0x41 && c <= 0x5A) ||
            (c >= 0x61 && c <= 0x7A));
}

constexpr bool IsNumber (char c) { /* blah blah */ }

constexpr bool IsSymbol (char c) { /* blah blah */ }

constexpr Type whichCategory (char c) { /* blah blah */ }

constexpr Type table[128] = { INITIALIZE };

where INITIALIZE is the entry point of some very lengthy macro hacks. Something like

#define INITIALIZE INIT(0)
#define INIT(N) INIT_##N
#define INIT_0 whichCategory(0), INIT_1
#define INIT_1 whichCategory(1), INIT_2
//...
#define INIT_127 whichCategory(127)

I would like a way to populate this array or a structure containing the array without the need for this macro hack...

Maybe something like

struct Table {
    Type _[128];
};

constexpr Table table = MagicFunction();

So, the question is how to write this MagicFunction?

Note: I am aware of cctype and likes, this question is more of a Is this possible? rather than Is this the best way to do it?.

Any help would be appreciated.

Thanks,

Jimmy Lu
  • 4,810
  • 7
  • 25
  • 30
  • 2
    You do know that ASCII only ranges `[0 .. 127]`? And that `char`'s signedness is implementation defined? Your current approach is very dangerous. Oh, and last but not least, the C++ standard doesn't demand ASCII encoding at all. It might aswell be EBCDIC. – Xeo Nov 09 '12 at 19:36
  • The good news is that because arrays can be initialized with pack expansions, what you ask for is indeed feasible. You just need to invoke the function plenty of times :p – Matthieu M. Nov 09 '12 at 19:39
  • Most likely not possible because C++ doesn't require ASCII representation for characters. Also, in the strictest sense, ASCII character set comprises of only 128 characters. – Happy Green Kid Naps Nov 09 '12 at 19:40
  • @MatthieuM. can you elaborate a bit? :) thx – Jimmy Lu Nov 09 '12 at 19:41
  • I know this does not answer your question directly, but what's wrong with using functions from `cctype`? http://www.cplusplus.com/reference/clibrary/cctype/ –  Nov 09 '12 at 20:02
  • There's already something similar (though not entirely equivalent) in the Standard library: the [`ctype`](http://en.cppreference.com/w/cpp/locale/ctype_char) facet of a locale (and its `classic_table`). It's not guaranteed to be ASCII (as Xeo pointed out) and is not constexpr. – dyp Nov 09 '12 at 20:23
  • @Xeo Why is it significant _here_ that C++ does not require any specific encoding? A char must be at least 8 bits wide (§1.7/1) and therefore can contain ASCII. I see no char literals in the OP's question. (I'm not completely sure about the signed/unsigned comparison, see §5/10 bullet 5 subbullet 3/4.) – dyp Nov 09 '12 at 20:36
  • 1
    @DyP: `((c >= 0x41 && c <= 0x5A) || (c >= 0x61 && c <= 0x7A))` from `IsAlphabet` -- this assumes the decimal ordering that is present in ASCII. The signedness is important since OP passes literals `> 127`, which may map to negative `char`s. – Xeo Nov 09 '12 at 20:40
  • The content or the uses of the array here really does not matter much *here*. For as far as I care, it doesn't even need to be a map between categories and ASCII characters. I just would like to see an efficient implementation to populate an array, which can be used as some sort of state machine, at compile time. – Jimmy Lu Nov 09 '12 at 20:43
  • @Xeo If `c` has been initialized by `c = 0x41; // A` why shouldn't this work? I only see problems with `c = 'A';`. – dyp Nov 09 '12 at 20:43
  • @Xeo, my questions is more of a `is it possible` instead of `is this the best way to do it`. Thanks for your pointing out the range for ASCII though. I always group ASCII and extended ASCII together but I guess not. I was also not aware that C++ does not require ASCII encoding. – Jimmy Lu Nov 09 '12 at 20:46
  • 1
    @BeyondSora: I'm sorry, I did not have the time to properly address this question yesterday. Thankfully Xeo did (despite his grumpiness :p). The indices generation combined with pack expansion is a great trick to generate all kind of initialization lists (as you can see here), and thus I was pointing out at the fact that since arrays accept initialization lists, you were golden. – Matthieu M. Nov 10 '12 at 11:33
  • 1
    @DyP: With EBCDIC, `0x41` isn't mapped to any symbol *at all*, and letters have a strange place in the codepage. Just see [here](http://en.wikipedia.org/wiki/EBCDIC#Codepage_layout). – Xeo Nov 12 '12 at 20:01

4 Answers4

31

Ignoring ALL the issues, indices to the rescue:

template<unsigned... Is> struct seq{};
template<unsigned N, unsigned... Is>
struct gen_seq : gen_seq<N-1, N-1, Is...>{};
template<unsigned... Is>
struct gen_seq<0, Is...> : seq<Is...>{};

template<unsigned... Is>
constexpr Table MagicFunction(seq<Is...>){
  return {{ whichCategory(Is)... }};
}

constexpr Table MagicFunction(){
  return MagicFunction(gen_seq<128>{});
}

Live example.

dened
  • 4,253
  • 18
  • 34
Xeo
  • 129,499
  • 52
  • 291
  • 397
  • 2
    @Steven: Added a link to the Lounge wiki entry. It basically builds a list `[0 .. 127]` and expands that, calling `whichCategory(0), whichCategory(1), ..., whichCategory(127)` and passes that as initializer arguments for list-initialization to `Table._` (notice the double `{}` for initializing the inner array). – Xeo Nov 09 '12 at 21:02
  • 1
    I didn't know you can return something of this form `{ /*...*/ }`, is this something new in C++11 or always there in the standard? – Jimmy Lu Nov 09 '12 at 21:09
  • 2
    @BeyondSora: New to C++11, called *list-initialization* (also less formal and more commonly known as uniform initialization). – Xeo Nov 09 '12 at 21:18
  • 3
    Why do you use double curly brackets? – Yola Oct 10 '16 at 18:44
  • 2
    This answer is obsolete in C++14 because C++14 allows you to build the whole array inside a constexpr function. – Omnifarious Jun 29 '17 at 22:05
  • 1
    NB: `seq` and `gen_seq` are in the standard library since C++14, called `std::index_sequence` and `std::make_index_sequence`. – lisyarus Oct 16 '17 at 16:53
  • @lisyarus - Why would you use C++14 specific features to solve a problem in a particular way when there is a much more compile-time (as in time taken to compile) and easier to understand way to solve the same problem that was also introduced in C++14? – Omnifarious Apr 23 '19 at 15:39
  • @Omnifarious I'm afraid I don't quite get your question. I'm just saying that part of the proposed solution can be replaced by using analogues from the standard library, decreasing the amount of code and increasing readability. – lisyarus Apr 23 '19 at 15:43
  • @lisyarus - Except, if you had a version of C++ new enough to have the features you mention, you wouldn't make a solution that looked like that. Because that version of C++ has other features that make solving this problem easier. – Omnifarious Apr 24 '19 at 15:55
  • @Omnifarious Look, the question was asked in 2012, and was given a good 2012-ish answer. If you feel that the answer should be updated/removed/etc, contact the author of the answer. Again, the ***only*** thing ***I*** am saying is that C++14 has `std::make_index_sequence`. I am ***not*** saying that this answer is still the best solution, that your answer is bad, or anything like that. – lisyarus Apr 24 '19 at 16:00
  • @lisyarus - I think the answer is great. It answers the question for C++11 and it's a fine answer. Your update to the answer is pointless. This answer doesn't need a tweak to use C++14 stuff in C++14. It needs to be completely re-written for C++14 to do things in a different way. So, if someone is using C++14s `std::make_index_sequence` to initialize a `::std::array` at compile time, they're doing something wrong, C++14 has a much better way to initialize an `::std::array` at compile time. But the answer itself is great. Your suggestion for an improvement is not. – Omnifarious Apr 24 '19 at 20:19
  • 1
    @Omnifarious Thank you for sharing your opinion; you are absolutely free in your desire to consider my comment bad, useless, pointless, whatever. There is a special "flag" feature for comments: you may use that. However, from my perspective the comment is still a valuable one. If someone has already been using this solution, there may be no reason to rewrite it completely just for the sake of using new C++ features. On the other hand, simplifying it slightly and effortlessly may be of interest. The intent of my comment was just providing information. I now digress from further discussion. – lisyarus Apr 25 '19 at 08:18
  • @lisyarus - I only flag comments that are actively abusive of people. You said: _If someone has already been using this solution, there may be no reason to rewrite it completely just for the sake of using new C++ features._ and I think that's a very valid point. Thank you. – Omnifarious Apr 27 '19 at 03:55
16

In C++17 ::std::array has been updated to be more constexpr friendly and you can do the same as in C++14, but without some of the scary looking hacks to get around the lack of constexpr in crucial places. Here is what the code would look like there:

#include <array>

enum Type {
    Alphabet,
    Number,
    Symbol,
    Other,
};

constexpr ::std::array<Type, 128> MagicFunction()
{
   using result_t = ::std::array<Type, 128>;
   result_t result = {Other};
   result[65] = Alphabet;
   //....
   return result;
}

const ::std::array<Type, 128> table = MagicFunction();

Again MagicFunction still needs to obey the rather loose constexpr rules. Mainly, it may not modify any global variables or use new (which implies modifying global state, namely the heap) or other such things.

Omnifarious
  • 54,333
  • 19
  • 131
  • 194
  • I came here trying to fill a `std::array>` like this with `result[65] = std::make_pair( x,y )`. This won't work, but `result[65].first = x; result[65].second = y;` will. – mxmlnkn Oct 17 '20 at 22:47
5

IMHO the best way to do this is simply write a tiny setup program that will generate table for you. And then you can either throw out the setup program, or check it in alongside the generated source code.

The tricky part of this question is just a duplicate of this other one: Is it possible to create and initialize an array of values using template metaprogramming?

The trick is, it's impossible to write anything like

Type table[256] = some_expression();

at file scope, because global arrays can be initialized only with literal (source-level) initializer-lists. You can't initialize a global array with the result of a constexpr function, even if you could somehow get that function to return a std::initializer_list, which you can't because its constructor isn't declared constexpr.

So what you have to do is get the compiler to generate the array for you, by making it a static const data member of a template class. After one or two levels of metaprogramming that I'm too confused to write out, you'll bottom out in a line that looks something like

template <int... Indices>
Type DummyStruct<Indices...>::table[] = { whichCategory(Indices)... };

where Indices is a parameter-pack that looks like 0,1,2,... 254,255. You construct that parameter-pack using a recursive helper template, or maybe just using something out of Boost. And then you can write

constexpr Type (&table)[] = IndexHelperTemplate<256>::table;

...But why would you do all that, when the table is only 256 entries that will never change unless ASCII itself changes? The right way is the simplest way: precompute all 256 entries and write out the table explicitly, with no templates, constexpr, or any other magic.

Community
  • 1
  • 1
Quuxplusone
  • 23,928
  • 8
  • 94
  • 159
  • IMHO hardcode it is bad because its important to understand the logic on that values if that was complex (in a more generic case). If there is n pregenerated factorial values who ensure my that there is no copypaste mistake nor error on some value? – Isaac Pascual Mar 27 '19 at 17:49
  • If there are n lines of complicated metaprogramming to compute the values, who ensures that there is no copy-paste mistake or bug in that code? Our goal is always to *reduce the possibility of error.* For something like a three-line `factorial` function, maybe you'd minimize the possibility of error by writing code instead of data. For OP's actual problem — a 128-byte classification table — I still feel strongly that writing data instead of code is the best way to minimize the possibility of error. (But note that `constexpr` programming has gotten MUCH more powerful and natural since 2012.) – Quuxplusone Mar 27 '19 at 18:42
  • I prefer to see what is behind the data than hardcoded numbers. – Isaac Pascual Apr 01 '19 at 08:56
4

The way to do this in C++14 looks like this:

#include <array>

enum Type {
    Alphabet,
    Number,
    Symbol,
    Other,
};

constexpr ::std::array<Type, 128> MagicFunction()
{
   using result_t = ::std::array<Type, 128>;
   result_t result = {Other};
   const result_t &fake_const_result = result;
   const_cast<result_t::reference>(fake_const_result[65]) = Alphabet;
   //....
   return result;
}

const ::std::array<Type, 128> table = MagicFunction();

No clever template hackery required any longer. Though, because C++14 didn't really undergo a thorough enough review of what did and didn't have to be constexpr in the standard library, a horrible hack involving const_cast has to be used.

And, of course, MagicFunction had better not modify any global variables or otherwise violate the constexpr rules. But those rules are pretty liberal nowadays. You can, for example, modify all the local variables you want, though passing them by reference or taking their addresses may not work out so well.

See my other answer for C++17, which allows you to drop some of the ugly-looking hacks.

Omnifarious
  • 54,333
  • 19
  • 131
  • 194
  • 1
    You can even use a `for` loop inside `MagicFunction`. – aschepler Jun 30 '17 at 04:09
  • I get an error about `result` being declared uninitialized, though. `std::array result{};` does work - I guess `Type{}` is `Alphabet`. – aschepler Jun 30 '17 at 04:13
  • @aschepler oops, I'll fix that. – Omnifarious Jun 30 '17 at 05:04
  • What does the `::` at the beginning? – tuket Aug 19 '17 at 16:11
  • 1
    @user1754322 - See this question: https://stackoverflow.com/questions/1661912/why-does-everybody-use-unanchored-namespace-declarations-i-e-std-not-std – Omnifarious Aug 19 '17 at 20:08
  • 1
    Ummm... This does not compile with C++14. You need a `constexpr reference operator[]( size_type pos );` which only exists in C++17: http://en.cppreference.com/w/cpp/container/array/operator_at – Tim Rae Oct 16 '17 at 04:56
  • @TimRae - I thought I'd passed the C++14 flag to the compiler when I tested that. :-/ Perhaps the standard library implementation I was using did not properly remove functions when the version was set to C++14. Oh, well. I updated my answer to reflect the need for C++17. One could, of course, do something like `*(result.data() + index) = Other;`, but that would be horrible. – Omnifarious Oct 16 '17 at 16:26
  • @TimRae - Oh-ho! The `.data` function isn't constexpr either! _sigh_ Well, I'll update the answer with two versions, a C++14 and a C++17 version. – Omnifarious Oct 16 '17 at 16:36
  • C++17 is official now and supported by gcc 7 and clang 6, so I think you could make it more prominent in your answer with an actual code block. – Tim Rae Oct 16 '17 at 21:33
  • Or even better just make a second answer – Tim Rae Oct 16 '17 at 21:33
  • 1
    @TimRae - Done. :-) – Omnifarious Oct 16 '17 at 21:46
  • Why do you need all the `const` stuff? Are you not done after the line `result_t result = {Other};`? The lines after that also don't seem used, since what is returned is `return result;`, and not the `fake_const_result`? – Ela782 Jul 27 '18 at 22:09
  • @Ela782 - If you notice, I have a C++17 answer that does exactly that. The problem is that the C++14 standard neglected to mark the non-`const` version of `operator []` for `std::array` as `constexpr`. So, I can't use it inside of a `constexpr` function. – Omnifarious Jul 28 '18 at 12:36