3

I want to store a static constant bitset of 216 bits, with a specific sequence of 1s and 0s that never changes.

I thought of using an initializer string as proposed by this post :

std::bitset<1<<16> myBitset("101100101000110 ... "); // the ellipsis are replaced by the actual 65536-character sequence

But the compiler (VS2013) gives me the "string too long" error.

UPDATE

I tried splitting the string into smaller chunks, as proposed in the post linked above, like so:

std::bitset<1<<16> myBitset("100101 ..."
                            "011001 ..."
                            ...
                            );

But I get the error C1091: compiler limit: string exceeds 65535 bytes in length. My string is 65536 bytes (well technically 65537, with the EOS character).

What are my other options?

UPDATE

Thanks to luk32, this is the beautiful code I ended up with:

const std::bitset<1<<16> bs = (std::bitset<1<<16>("101011...")
    << 7* (1<<13)) | (std::bitset<1<<16>("110011...")
    << 6* (1<<13)) | (std::bitset<1<<16>("101111...")
    << 5* (1<<13)) | (std::bitset<1<<16>("110110...")
    << 4* (1<<13)) | (std::bitset<1<<16>("011011...")
    << 3* (1<<13)) | (std::bitset<1<<16>("111011...")
    << 2* (1<<13)) | (std::bitset<1<<16>("111001...")
    << 1* (1<<13)) | std::bitset<1<<16>("1100111...");

3 Answers3

1

You didn't really split the literal. It gets concatenated for compilation anyways. You are getting limited by the compiler. I don't think there's a way to increase this limit in MSVC.

You can split it into two literals, initialize two bitsets, shift 1st part and OR with the other.

Something like:

#include <iostream>
#include <string>
#include <bitset>

 
using namespace std;
int main()
{
    std::bitset<8> dest("0110");
    std::bitset<8> lowBits("1001");

    dest <<= dest.size()/2;
    dest |= lowBits;
    std::cout << dest << '\n';
}

If you look at the clang compiler output at -02, it gets optimized to loading 105 which is 01101001.

My testing shows that if you swap 8 for 1<<16 it uses SSE, so it should be pretty safe bet. It didn't drop the literals like in case of 8 or 16, so there might be some runtime overhead, but I am not sure if you can do much better.

EDIT:

I did some more tests, here is my playground:

#include <iostream>
#include <string>
#include <bitset>
 

using namespace std;
int main()
{
    //static const std::bitset<16> set1( "01100110011001100110011001100110");
    static const std::bitset<16> set2(0b01100110011001100110011001100110);

    static const std::bitset<16> high(0b01100110);
    static const std::bitset<16> low (0b01100110);
    static const std::bitset<16> set3 = (high << 8) | low;
    std::cout << (set3 == set2) << '\n';
}

I couldn't get compile time optimization for const char* constructor on any compiler except for clang, and that worked up to 14 characters. There seems to be some promise if you make a bunch of bitsets initialized from unsigned long long and shift and combine them together:

static const std::bitset<128> high(0b0110011001100110011001100110011001100110011001100110011001100110);
static const std::bitset<128> low (0b1001100110011001100110011001100110011001100110011001100110011001);
static const std::bitset<128> set3 = (high << high.size()/2) | low;
std::cout << set3 << '\n';

This makes compilers to stick to binary data storage. If could use a bit newer compiler with constexpr I think it would be possible to declare it as an array of bitsets constructed from ulls and have them concatenated by a constexpr function and bound to a constexpr const variable, which should ensure best optimization possible. Compiler still could go against you, but there would be no reason. Maybe even without constexpr it would generate pretty much optimal code.

luk32
  • 15,812
  • 38
  • 62
  • The question is about MSVC, and your Godbolt link is about Clang which are much newer and much better than MSVC at optimization – phuclv Apr 28 '21 at 15:41
  • What does that change? What do you expect me to improve? The solution works as good as it can on every compiler. clang is not really newer than msvc. New clang is newer than old msvc, yes. That's obvious. I didn't make any unsubstantiated claim, the optimization part is an addendum, which also states it's pretty limited. I can change it to "*using size of bitset > 14 results in tuntime initialization on every compiler I tested*" but note that OP asked how to get it compiled, not how to get super optimal. – luk32 Apr 28 '21 at 16:34
  • by reading what you wrote people will think that MSVC optimizes like that or even uses SSE which is not the case. At least you must change the compiler to MSVC in Godbolt – phuclv Apr 28 '21 at 16:41
  • It won't use SSE? Why not? Anyways, I added a conclusions of my fiddling, with a clear statement about initializing from `const char*`. Unfortunately MSVC below 19 is unavailable on godbolt and even available version's assembly comes out garbled for me. I don't think the results would be much different though. I don't believe VC13 compiler is worse at those optimizations that old clang and gcc. – luk32 Apr 28 '21 at 17:04
  • because auto-vectorization is only really available since VS2015 although there is some preliminary auto-vectorization support in VS2013, and code generation has only been significantly improved with the new SSA compiler in VS2017. All MSVC versions before 2015 is just shitty, and even VS2019 output is still usually behind gcc and Clang. You can see lots of comparisons in the [tag:x86] tag – phuclv Apr 29 '21 at 00:42
  • Thank you for all your explanations, I used your solution and it works fine. I suppose it's not really optimised, but in practice it's enough. – Richard Pickle Apr 30 '21 at 09:20
0

You may consider skipping compilation altogether, and simply:

  • Assemble the data into an object file (segment .rodata), exporting symbols for it and its size.
  • Declaring these symbols as extern const in a .h file.
  • Use these symbols and link your program to this object file.

I don't have MASM32 handy to write a complete answer that actually works, but I use this technique often with GAS and LD and it culls a lot of issues. (loading-on-demand, security descriptors of an otherwise separate data file, blazingly fast compile times...)

Note that this is what the VS resource compiler does, in short... so you may include your data as a resource and get a pointer to it.

Laurent LA RIZZA
  • 2,905
  • 1
  • 23
  • 41
0

It's impossible to have a static std::bitset like that because:


In case construction at runtime is allowed then simply split the string literal into multiple smaller ones less than 2048 characters in case the total length is smaller than 65536:

ANSI compatibility requires a compiler to accept up to 509 characters in a string literal after concatenation. The maximum length of a string literal allowed in Microsoft C is approximately 2,048 bytes. However, if the string literal consists of parts enclosed in double quotation marks, the preprocessor concatenates the parts into a single string, and for each line concatenated, it adds an extra byte to the total number of bytes.

[...]

While an individual quoted string cannot be longer than 2048 bytes, a string literal of roughly 65535 bytes can be constructed by concatenating strings.

https://learn.microsoft.com/en-us/cpp/c-language/maximum-string-length?view=msvc-160

As said, longer strings must be concatenated manually. Here

const int LENGTH = 1 << 16;
std::bitset<LENGTH> myBitset(
    "100101 ..."  // 2ᴺ bits
    "011001 ..."  // 2ᴺ bits
    ...
    "001011 ...", // must be one shorter than the previous lines: 2ᴺ⁻¹ bits
    LENGTH - 1    // size
);
myBitset[LENGTH - 1] = 1; // set the final bit

Alternatively just use an array instead of string literal:

static const char BITSET[LENGTH] = {
    '1', '0', '0', '1',...
    ...
    '0', '1', '0', '0'
};
std::bitset<LENGTH> myBitset(BITSET, sizeof(BITSET));
phuclv
  • 37,963
  • 15
  • 156
  • 475
  • 1
    Why do you think the problem above has anything to do with constexpr? – Claas Bontus Apr 28 '21 at 15:22
  • It's very much possible to have `static bitset<>` since pretty old days (works on gcc 4.4 with `-std=c++0x`). It's hard to have it initialized by a precompiled binary sequence. E.g. clang optimizes initialization string up to size 14, gcc doesn't `static` just indicates storage. You can have `static const bitset` initialized at runtime, at the beginning of run. `constexpr` could help but it's not guaranteed either. – luk32 Apr 28 '21 at 15:24
  • @luk32 the question is about MSVC, and a very ancient version of MSVC – phuclv Apr 28 '21 at 15:40
  • @ClaasBontus why do you think "initialization at compile time" has nothing to do with constexpr? – phuclv Apr 28 '21 at 15:41
  • @phuclv Because compiler is not forced to use precomputed value for initialization anyways. It could help though. `constexpr` just means that it's possible to evaluate function at compile time, it still can leave logic for run-time. On the other hand `static` doesn't change anything and your answer suggest that you can't have `static bitset` which is wrong. You can't have `bitset` *like that* i.e. initialized at compile time. But you can't with any compiler (apart form exception I pointed out). `constexpr` and `static` do not change anything in that regard. – luk32 Apr 28 '21 at 17:18
  • @luk32 static here is not the `static` keyword in C++, I didn't use the `code` for you, don't you see? The OP used that to say initialization at compile time – phuclv Apr 29 '21 at 00:35