1

I have some code at the moment that looks like this:

#define ______ 0x0000
static const uint16_t plane0[256] = {
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047,
    0x0048, 0x0049, 0x004A, 0x004B, 0x004C, 0x004D, 0x004E, 0x004F,
    0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057,
    0x0058, 0x0059, 0x005A, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, 0x039C, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ...
};
#undef ______

uint16_t caseup(uint16_t wc)
{
    return (plane0[wc] == 0x0000) ? wc : plane0[wc];
}

I would really like to replace that caseup function with a simple return plane0[wc]. The extra compare-and-branch might not be very expensive in the big picture, but certainly the code would be strictly more efficient if we got rid of it.

But I don't want to have to rewrite the table. Not even using a tool to rewrite it — I don't want our case-mapping table cluttered up with a lot of garbage hex values. I want the table to remain mostly pristinely macro-ized, with hex values only in the places that actually require non-identity case mappings.

What's the cleanest way to do this in C++11?

Quuxplusone
  • 23,928
  • 8
  • 94
  • 159
  • 1
    ________ (I'm not checking that) is a [reserved identifier](http://stackoverflow.com/questions/228783/what-are-the-rules-about-using-an-underscore-in-a-c-identifier). – chris Nov 05 '13 at 01:27
  • If you subtract their position from non-zero items, you could replace `caseup`'s code with `return wc + plane0[wc];` – Sergey Kalinichenko Nov 05 '13 at 01:38
  • 1
    Do you have `constexpr` support in your compiler? – Yakk - Adam Nevraumont Nov 05 '13 at 01:49
  • @dasblinkenlight That's not a bad idea, but it *does* mangle the readability of the nice Unicode values I currently have. @Yakk yes, but it seems to be a no-op with `-O0` except where absolutely required by the virtual machine (e.g., in array bounds or `static_assert`s). @chris true, but I don't care. – Quuxplusone Nov 05 '13 at 02:44
  • "certainly the code would be strictly more efficient" are you sure? Did you measure it? Lookups into large arrays aren't always the fastest. Or are you just interested in finding a "cool", obscure solution that avoids the obvious solutions? If so, please send your code to thedailywtf.com when you are done. – DanielKO Nov 05 '13 at 02:50

3 Answers3

0

I've thought of this:

template<int N>
struct PlaneMapping {
    uint16_t i;
    uint16_t data[N];

    template<typename... Args>
    constexpr PlaneMapping(Args... a) : i(0), data { uint16_t(a ? (i++,a) : i++)... }
    {}
};

static const PlaneMapping<256> plane0(
    ______, ______, ______, ______, ______, ______, ______, ______,
    ______, ______, ______, ______, ______, ______, ______, ______,
    ...
);

uint16_t caseup(uint16_t wc)
{
    return plane0.data[wc];
}

This is reasonably clean, I guess, but all those i++s are ugly, and you have to pass -O1 or better before it'll compile all the way down to static data instead of running a ton of code from _main. Is there a cleaner solution?

Quuxplusone
  • 23,928
  • 8
  • 94
  • 159
0

This is a case that I'd recommend not performing such optimization unless profiling shows that it's a bottleneck. Like you said, you like the cleanness of the table (easier to read/maintain), and if performance is not critical here, you want to keep those properties.

On the other hand, there is a way to trade space with time: make a another copy of the array, with values translated; you only need to perform the copy/translate once, and in caseup() this new array is looked up without branching. The original array is not changed and is still clean and easy to change.

static const uint16_t plane0lookup[256];
for(uint16_t i = 0; i<256; ++i)
{
    plane0lookup[i] = (plane0[wc] == 0x0000) ? i : plane0[wc];
}

uint16_t caseup(uint16_t wc)
{
    return plane0lookup[wc];
}
X.J
  • 662
  • 3
  • 6
0
uint16_t caseup(uint16_t wc)
{
    static const std::array<uint16_t, 256> plane0Map = [&]
    {
        std::array<uint16_t, 256> mapping;

        for(size_t i = 0; i < 256; ++i)
            mapping[i] = plane0[i] == 0 ? i : plane0[i];

        return mapping;
    }();

    return plane0Map[wc];
}
David
  • 27,652
  • 18
  • 89
  • 138