3

I develop my own string class that has both small string optimization and has an internal flag to know if the string is Ascii, UTF8, WTF8 or a byte string. The constructor

String(const char* );

can be used to construct either an Ascii string or an UTF8 string. It should only be used with literals such as:

const String last_name = "Fayard"
const String first_name = "François"

The constructor needs to compute both the length of the string and check if it is Ascii or UTF8. Therefore, I wrote those functions so they can be evaluated at compile time.

inline constexpr il::int_t size(const char* s) {
  return (*s == '\0') ? 0 : (size(s + 1) + 1);
}

inline constexpr bool isAscii(const char* s) {
  return (*s == '\0')
         ? true
         : (((static_cast<unsigned char>(*s) & 0x80_uchar) ==
             0x00_uchar) && isAscii(s + 1));
}

The constructor is written like this and is available in the headers so he can be inlined.

String(const char* data) {
  const int n = size(data);
  const bool ascii = isAscii(data);

  if (n <= max_small_string) {
    ...
  } else {
    data_ = malloc();
    ...
  }
}

But I can't manage to get the functions size and isAscii run be evaluated at compile time (tried and check the assembly with gcc 4.8.5, clang 4.0.1, icpc 17.0.4). Is there a way to do that?

PS : The solution needs to be C++11 only and compile with gcc 4.8.5 and Visual Studio 2015.

InsideLoop
  • 6,063
  • 2
  • 28
  • 55
  • 3
    Well, you give these functions *runtime* data, so how do you think should the compiler be able to evaluate them at *compile time*? – zett42 Aug 06 '17 at 23:16
  • Have you tried to move the rest of the code in your constructor to a separate function? – sbabbi Aug 06 '17 at 23:18
  • @zett42 : I don't understand what you mean by runtime data. The functions `size("Hello")` and `isAscci("François")` are evaluated at compile time. Why can't they be evaluated at compile time in the constructor when this one is inlined? – InsideLoop Aug 06 '17 at 23:19
  • @sbabbi : I have tried to write a constuctor with signature `String(bool ascii, const char* data, int n)` and define `String(const char* data) : String{isAscii(data), data, size(data)} {}`. But It does not work either. – InsideLoop Aug 06 '17 at 23:21
  • Shouldn't the constructors be something like `template constexpr String(const char(&string_literal)[N]);` for to make those to work constexpr with string literals? – Öö Tiib Aug 06 '17 at 23:23
  • @Öö Tiib : I don't want the constructor to be constexpr. I just want to use a constexpr function inside it that could be evaluated at compile time. Anyway, I think the malloc makes it impossible for this constructor to be constexpr. – InsideLoop Aug 06 '17 at 23:27
  • 1
    The arguments passed to non-constexpr constructor won't be evaluated compile time. – Öö Tiib Aug 06 '17 at 23:28
  • @Öö Tiib : That's what I fear. Do you see a workaround when the function can be inlined? The size of the "small strings" is limited to 22 bytes. Above that, one should use a malloc. – InsideLoop Aug 06 '17 at 23:29
  • The `inline` in C++ does not mean that it is `constexpr`. Opposite is true; `constexpr` abiout function implies that it is `inline`. – Öö Tiib Aug 06 '17 at 23:34
  • @Öö Tiib : I don't use inline for that. It not even used for inlining these days. It's just there because the function is defined in a `*.h` file. – InsideLoop Aug 06 '17 at 23:36
  • @Öö Tiib : Your idea of defining `template constexpr String(const char(&string_literal)[n]);` seems to be nice. But the constructor can only be `constexpr` when `n <= 22` which is the small string optimization case as I believe that malloc are not allowed in constexpr constructors. How could I specify that? – InsideLoop Aug 06 '17 at 23:40

3 Answers3

2

You can use enable_if to constrain your constructor not to take more than 22 characters:

template<size_t N, typename std::enable_if<(N <= 22), int>::type = 0> 
constexpr String(const char(&string_literal)[N]) { /*...*/ }
Öö Tiib
  • 10,809
  • 25
  • 44
  • Sounds good. I am back home as it is 2am in France. I'll give it a try tomorrow morning. – InsideLoop Aug 06 '17 at 23:56
  • It fails with gcc 4.8.5 but works with clang++ 4.0.1 and -std=c++11. The body of the constructor contains a std::memcpy which is not allowed by gcc 4.8.5. I get the feeling that even though clang accept it as C++11, it needs C++14 to compile. Anyway, I need gcc 4.8.5 compatibility as it is a scientific library and RHEL 7 which is the "standard" on clusters ships with gcc 4.8.5. – InsideLoop Aug 07 '17 at 07:12
2

argument of function are not constexpr, so you cannot propagate the string literal.

One way is to turn literal string into char sequence:

template<typename C, C...cs> struct Chars
{
    using str_type = C[1 + sizeof...(cs)];
    static constexpr C str[1 + sizeof...(cs)] = {cs..., 0};

    constexpr operator const str_type&() const { return str; }
};

template<typename C, C...cs> constexpr C Chars<C, cs...>::str[1 + sizeof...(cs)];

// Requires GNU-extension
template <typename C, C...cs>
constexpr Chars<C, cs...> operator""_cs() { return {}; }

Without gnu extension, you have to use some MACRO to transform literal into char sequence, as I do there.

Then you have all value information from types:

template <typename C, C ... Cs>
constexpr il::int_t size(Chars<C, Cs...>) {
  return sizeof...(Cs);
}

template <typename C, C ... Cs>
constexpr bool isAscii(Chars<C, Cs...>) {
    // C++17 folding expression
    return ((static_cast<unsigned char>(Cs) & 0x80_uchar) == 0x00_uchar && ...);
}

or for C++11:

template <typename C>
constexpr bool isAscii(Chars<C>) { return true; }

template <typename C, C head, C ... Cs>
constexpr bool isAscii(Chars<C, Cs...>) {
    // C++17 folding expression
    return ((static_cast<unsigned char>(Head) & 0x80_uchar) == 0x00_uchar
           && isAscii(Chars<C, Cs...>{});
}
Jarod42
  • 203,559
  • 14
  • 181
  • 302
1

This is basically @Öö Tiib's ideea, but expanded to show it works in gcc 4.8.5 as you said you require:

struct String
{
  static const int max_small_string = 10;
  int size_;
  char* data_;

  template <int N,
      typename std::enable_if<(N <= String::max_small_string), void*>::type = nullptr>
  constexpr String(const char (&str)[N])
    : size_{size(str)},
      data_{}
  {
  }

  template <int N,
    typename std::enable_if<(N > String::max_small_string), void*>::type = nullptr>
  String(const char (&str)[N])
    : size_{size(str)},
       data_{static_cast<char*>(malloc(size_))}
  {
  }
};


auto foo() -> void
{
  constexpr String ss = String{"asd"}; // OK, constexpr

  String hs =  String{"a sdjwq niornyuqe rniehr iwhtR Trtj rjtsd asde"};
}

You cannot have a constexpr containing malloc, there is no way around it and it looks it will never be. I read about some discussion about introducing a constexpr_vector into the standard, but allowing random memory access in constexpr context will be extremely tricky because constexpr needs to detect and fail on every possible UB and so it most likely won't be supported in the foreseeable future.

But you can have a constexpr small string constructor as I've shown you. Check it on godbolt with gcc 4.8.5


You said you want to initialize a stack variable. By that I think you mean an automatic storage one. Yes it can be done in C++11:

template <int... Is> struct Seq{};

template <int I, int Max, int... Is>
struct Make_seq_impl
{
    using Type = typename Make_seq_impl<I + 1, Max, Is..., I>::Type;
};

template <int Max, int... Is>
struct Make_seq_impl<Max, Max, Is...>
{
    using Type = Seq<Is...>;
};

template <int N>
using Make_seq = typename Make_seq_impl<0, N>::Type;

struct X
{
    static const int max_size_ = 10;
    char data_[max_size_];

    template <int N, int... Is>
    constexpr X(const char (&str)[N], Seq<Is...>)
        : data_ {(Is < N ? str[Is] : '\0')...}
    {
    }

    template <int N>
    constexpr X(const char (&str)[N])
        : X(str, Make_seq<max_size_>{})
    {
    }
};

auto test() -> void
{
    constexpr X x{"Asd"};

    static_assert(x.data_[0] == 'A', "");
    static_assert(x.data_[1] == 's', "");
    static_assert(x.data_[2] == 'd', "");
    static_assert(x.data_[3] == '\0', "");
}

I left it to you to combine the 2 methods.

bolov
  • 72,283
  • 15
  • 145
  • 224
  • The problem is that the small constructor needs to copy the content of `str` to a stack allocated `data_`. It seems to need `C++14` to do that kind of thing which rules out gcc 4.8.5. – InsideLoop Aug 07 '17 at 13:59
  • you mean you need to copy `str` to `data_`? – bolov Aug 07 '17 at 14:17
  • you can to an array i.e. `char small_data[max_small_string]`. When I will have some time I will cook a solution – bolov Aug 07 '17 at 14:30
  • I have tried to copy it to `small_data` with `memcpy` or a for loop and both get rejected by gcc 4.8.5. It seems that this is not allowed by `C++11` but it is allowed in `C++14`. – InsideLoop Aug 07 '17 at 15:24