7

Before anyone says, "DON'T DO THIS as it is really bad".

  1. I understand the reasons for having a NUL terminated string.
  2. I know one can state something like
    char mystr[] = { 'm', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g'};
    However, the convenience of the c-string representation is too great.

The rational for this is that I'm programming for a micro-controller and I need to store data into the programme's memory. Some of the data is in the form of bytes, words, dwords and floats. I'd like the data to include strings without the NUL contiguously.

I've tried templates that take <size_t N, char* A> and <size_t N, char (&A)[N]> as parameters in order to traverse the array and store its contents to a static array, but I can't seem to get it right. I think the standard may actually disallow this which is understandable in the general case, but unfortunate in specific cases (specifically, this one. ;) :( )

If I could remap the string as something like a boost::mpl::vector_c<char, ...> template, that would be better as I have other code that will store it properly, but dereferencing an array from within a template to be used as a const template parameter appears to be disallowed too.

Any ideas?

EDIT:

Psudocode example (this is kinda contrived as the real code is much larger, also I wouldn't probably read byte by byte like this, nor would I be using a literal to iterate to the end of the string. That would be embedded in the data as well somewhere.):

// this stores bytes in an array
template<typename X, typename T, T ...numbers>
struct x
{
  static PROGMEM volatile const T data[];
};
template<typename X, typename T, T ...numbers>
PROGMEM volatile const T x<X, T, numbers...>::data[] = { numbers... };

void main()
{
  // this will not work, but the idea is you have byte 0 as 1, 
  // byte 1 as 2 byte 2 as 3 byte 3 as 's', byte 4 as 'o'...
  // byte 22 as 'g', byte 23 as 4, byte 24 as 5, byte 25 as 6.
  typedef x<int, char, 1,2,3,"some embedded string",4,5,6> xx;
  for(i=0; i<20; ++i)
    Serial.print(pgm_read_byte_near(&xx::data[0] + 3));
}

Also note that I am not using C++11, this is C++0x, and possibly an extension.

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
Adrian
  • 10,246
  • 4
  • 44
  • 110
  • 1
    What are you trying to do with these strings? Statically initialize them? Overlay them on some address? Just use them as fixed-size strings? This seems so easy there must be some catch ... – Useless May 10 '13 at 13:36
  • My mind keeps screaming... DON'T DO THIS and use std::string or nul terminated cstrings instead. Perhaps a better understanding of the purpose/requirements, rather than your failed attempts, would help me form a more solid answer. – MobA11y May 10 '13 at 13:36
  • 1
    Sounds like an interesting problem, but the question is not very clear. Could you provide a pseudo-code example of what you're after? – Angew is no longer proud of SO May 10 '13 at 13:36
  • @ChrisCM: A micro-controller has VERY limited memory resources. I wouldn't bother otherwise. – Adrian May 10 '13 at 13:44
  • @Useless: Store them in the data segment to be read directly from that segment. It would contain a variety of information, stored contiguously. For parts that are to store strings, there will be somewhere in the data, a number indicating the length of the string. That length will be less than a byte long. – Adrian May 10 '13 at 13:47
  • So you're just looking for statically-initialized constants, like a string literal but without the nul? – Useless May 10 '13 at 13:49
  • @ChrisCM: std::string does not do this, even with that compile switch. – Adrian May 10 '13 at 13:52
  • I think I've got it - you want the fixed-size un-terminated character array, but you want to keep the _syntax_ of the built-in string literal. Is that it? – Useless May 10 '13 at 13:53
  • @Useless: Yes, exactly – Adrian May 10 '13 at 13:55
  • @Angew: Do you still need a psudo-code example? – Adrian May 10 '13 at 14:07
  • @Adrian No, it's now clear to me from Useless's last coment. – Angew is no longer proud of SO May 10 '13 at 14:11
  • 11
    DON'T DO THIS as it is really bad –  May 10 '13 at 17:14
  • 2
    Is your microcontroller short on memory? What is you objection to trailing '\0' bytes? It takes the same memory as your leading length byte. – brian beuning May 10 '13 at 17:52
  • I don't have a leading length byte in my example. But in the implementation, I'd have a partial byte used for this. I also want to stream the bytes stored in memory directly, that is why I want to control the data layout more tightly. – Adrian May 10 '13 at 18:31
  • 1
    To the _DON'T DO THIS_ brigade: it's perfectly normal to want to reference fixed-width strings as such, especially for (de)serialising formats which have fixed-width character arrays. There's nothing magical or wonderful about nul-termination. The difficulty is with fixed-width string _literals_, where C strings are specially blessed by the language. – Useless May 10 '13 at 19:08
  • 1
    @adrian Please elaborate on "control data layout more tightly". I think that is the piece missing. – brian beuning May 10 '13 at 19:20
  • @brianbeuning, I'm not sure what to say. I would *like* the data in memory to be laid out in an optimal way, without interference or at the very least allowed by the compiler. I would like there to be minimal processing, and no memory overhead for a structure that I am building. There should be no padding, no pointers, just the string embedded in the binary data that will be used by the micro-controller as well as the machines that it communicates with. Do you understand? – Adrian May 10 '13 at 19:37
  • *Oxymoron* - *"non-nul terminated C string"*. Nit, a C-string is defined by nul-termination. Without it, you simply have an array. – David C. Rankin Aug 21 '19 at 18:53
  • @DavidC.Rankin, meh. A string of characters formed by a string literal which excluds the null terminator. – Adrian Aug 21 '19 at 19:03
  • @Adrian - negative, A string of characters formed by a string literal which INCLUDES the null terminator. – David C. Rankin Aug 21 '19 at 19:05
  • @DavidC.Rankin, negative, **I wanted** a string of characters formed by a string literal which **excludes** the null terminator. – Adrian Aug 21 '19 at 19:10
  • @Adrian - then you want an array initialized by an *array initializer* of *character-literals* (or an array sized only for the *length* of the string-literal initializer) – David C. Rankin Aug 21 '19 at 20:09
  • Maybe my other comment wasn't clear. **I wanted** a string of characters, which **excludes** any null terminator, formed by a string literal. – Adrian Aug 21 '19 at 20:38
  • Yes, what you have will work, or `char mystr[9] = "my string";` will also work. (or generically `char mystr[sizeof "my string" - 1] = "my string";`) – David C. Rankin Aug 21 '19 at 20:51
  • @DavidC.Rankin, do you programme in C++? That is invalid. That might work in C, but not C++. – Adrian Aug 21 '19 at 20:59
  • Yes, I had my C hat on, C++ will not ignore the nul-character and refuses to create a character array without it -- that almost looks like a bug in g++. g++ will compute `sizeof "my string" - 1` correctly, but complains that there are not enough characters in `mystr` to store `"my string"` without the nul-character `"-fpermissive"`. I learned something new, thank you. – David C. Rankin Aug 21 '19 at 21:14
  • @Adrian - it's actually a C++ language feature not to allow initialization with an initializer larger than the storage provided. [8.5.2 - C++ 2011 draft n3242](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf) Completely different than C, [C11 Standard - 6.7.9 Initialization(p14)](http://port70.net/~nsz/c/c11/n1570.html#6.7.9p14). – David C. Rankin Aug 22 '19 at 01:48
  • 1
    @DavidC.Rankin, sorry, what's your point? C++ isn't C. Used to be a subset, but the two have diverged in many ways now. – Adrian Aug 22 '19 at 14:55
  • The point being I like to confirm the language behavior from the standard. For those like minded I passed the references along. – David C. Rankin Aug 22 '19 at 22:53

2 Answers2

3

Third try

magic and trickery

If you were using C++11 (I know, but in its absence I think code generation is your best bet), it feels like a user-defined literal should be able to handle this. Eg, with:

template <char... RAW>
inline constexpr std::array<char, sizeof...(RAW)> operator "" _fixed() {
    return std::array<char, sizeof...(RAW)>{RAW...};
}

it would be nice if this worked:

const std::array<char, 7> goodbye = goodbye_fixed;

... but sadly it doesn't (the literal needs to be numeric, presumably for parsing reasons). Using "goodbye"_fixed doesn't work either, as that requires an operator "" _fixed(const char *s, int length) overload and the compile-time array has decayed to a pointer again.

Eventually we come down to invoking this:

const auto goodbye = operator "" _FS <'g','o','o','d','b','y','e'>();

and it's no better than the ugly first version. Any other ideas?


Second try

auto-generate the ugliness

I think you're right that you can't easily intercept the string literal mechanism. Honestly, the usual approach would be to use a build tool to generate the ugly code for you in a separate file (cf. internationalization libraries, for example).

Eg, you type

fixed_string hello = "hello";

or something similar in a dedicated file, and the build system generates a header

const std::array<char, 5> hello;

and a cpp with the ugly initialization from above below.


First try

missed the "looks like a string literal" requirement

I've tried templates ...

like this?

#include <array>
const std::array<char, 5> hello = { 'h', 'e', 'l', 'l', 'o' };

#include <cstdio>
int main()
{
    return std::printf("%.*s\n", hello.size(), &hello.front());
}

If you don't have C++11, Boost.Array will work, or you can roll your own. Note that this is just a type wrapper around const char[5], so should be ok to go in the data segment (I've confirmed it goes in .rodata with my local gcc).

Useless
  • 64,155
  • 6
  • 88
  • 132
  • It's not just the typing (though I guess that's part of it), it is the style. It is ugly and detracts from what I'm trying to do, which is to generate an array of bytes that convey to a user a readable message. – Adrian May 10 '13 at 13:59
  • char* hello = new char[6]; //5 + 1 for null. Is still more efficient memory wise, though we're talking an order of like 3 bytes. – MobA11y May 10 '13 at 14:30
  • now you have a pointer (say 4 bytes), plus the heap management overhead, plus the nul terminator, _and_ you still have the original constant literal values somewhere in the data segment. It's the latter OP wants to access directly, rather than copying them somewhere else. – Useless May 10 '13 at 14:35
  • Yep, I realized that after reading your comments on the question. The pseudo code helped a lot too. – MobA11y May 10 '13 at 14:39
  • I don't bar brilliance from here, I *expect it*. ;) I would really like to avoid a separate code gen tool, as I would like to publish this lib with minimal requirements. – Adrian May 10 '13 at 14:51
  • I suspect a suitable code-gen tool could be bashed together with sed or awk, but I don't know your portability requirements. – Useless May 10 '13 at 15:20
  • Requirements would probably be win/mac/linux. Hmmmm, might make a simple parser using the C++ preparser as a front end and code generate that way. But I'm so fighting this. – Adrian May 10 '13 at 16:08
  • 1
    Maybe someone else knows a better way. I was optimistic about the user-defined literals :( – Useless May 10 '13 at 16:12
  • User-def literals may still work. But under a compiler that I don't have. :( – Adrian May 10 '13 at 17:43
  • UPDATE: User def literials do work on gcc and clang, but only as an extension. See my answer above. – Adrian Aug 22 '19 at 15:08
  • I'm going to have to spend some time digesting all that, but nice work! – Useless Aug 22 '19 at 21:05
2

I actually lost track of this Q and I don't know if I can find the original code I was working with back then, but I have figured out how to store a string without its terminating NUL character.

In c++17 I was able to fill a constexpr std::array<char, n> with a string of characters that doesn't contain the trailing zero.

#include <array>
#include <cstdio>

constexpr size_t str_len(char const * x)
{
    char const * begin = x;
    while (*x) {
        ++x;
    }
    return x - begin;
}

constexpr auto var = "hello there";

template <size_t I, size_t Max>
constexpr auto fn()
{
    // Although I did this recursively, this could have also been done iteratively.
    if constexpr (I < Max) {
        auto x = fn<I + 1, Max>();
        x[I] = var[I];
        return x;
    }
    else {
        return std::array<char, Max>{};
    }
}

int main()
{
    auto x = fn<0, str_len(var)>();
    printf("'%*.*s'\n", x.size(), x.size(), x.data());
    return 0;
}

This give the following assembly:

.LC0:
  .string "'%*.*s'\n"
main:
  sub rsp, 24
  mov edx, 11
  mov esi, 11
  movabs rax, 7526676540175443304 ; <<< hello there
  mov QWORD PTR [rsp+5], rax
  mov eax, 29285
  lea rcx, [rsp+5]
  mov edi, OFFSET FLAT:.LC0
  mov WORD PTR [rsp+13], ax
  xor eax, eax
  mov BYTE PTR [rsp+15], 101
  call printf
  xor eax, eax
  add rsp, 24
  ret

Yes, 7526676540175443304 is "hello there" without any terminating NUL character. See Demo.

Putting the first line in main() into the global space will result in the string to be located in the global .text segment.

.LC0:
  .string "'%*.*s'\n"
main:
  sub rsp, 8
  mov ecx, OFFSET FLAT:x
  mov edx, 11
  xor eax, eax
  mov esi, 11
  mov edi, OFFSET FLAT:.LC0
  call printf
  xor eax, eax
  add rsp, 8
  ret
x:           ; <<< hello there
  .byte 104
  .byte 101
  .byte 108
  .byte 108
  .byte 111
  .byte 32
  .byte 116
  .byte 104
  .byte 101
  .byte 114
  .byte 101

Demo

I can put it into a type as well:

template <char x, typename...Ts>
struct X
{
};

constexpr int str_len(char const * x)
{
    char const * begin = x;
    while (*x) {
        ++x;
    }
    return x - begin;
}

constexpr auto var = "hello there";

template <int I>
constexpr auto fn()
{
    if constexpr (I - 1 != 0)
        return X<var[str_len(var) - I], decltype(fn<I - 1>())>{};
    else
        return X<var[str_len(var) - I], void>{};
}

int main()
{
    decltype(nullptr)(fn<str_len(var)>());
    return 0;
}

Which gives me the output:

<source>:28:5: error: cannot convert 'X<'h', X<'e', X<'l', X<'l', X<'o', X<' ', X<'t', X<'h', X<'e', X<'r', X<'e', void> > > > > > > > > > >' to 'decltype(nullptr)' (aka 'nullptr_t') without a conversion operator
    decltype(nullptr)(fn<str_len(var)>());
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Demo

Now I can prolly massage this more to put it into the state I asked for above. The requirement was to store the string as not NULL terminated but also to do this in c++0x, which this isn't, so I won't be marking this as an answer. But I thought I'd put it out there.

Edit

Seems that gnu and clang also have an extension that allows for putting the string into a template type:

template <char...Cs>
struct chars {};

template <typename T, T...Xs>
chars<Xs...> operator""_xxx() {
    return {};
}

int main()
{
    decltype(nullptr)("hello there"_xxx);
    return 0;
}

Which spits out:

<source>:5:14: warning: string literal operator templates are a GNU extension [-Wgnu-string-literal-operator-template]
chars<Xs...> operator""_xxx() {
             ^
<source>:11:5: error: cannot convert 'chars<'h', 'e', 'l', 'l', 'o', ' ', 't', 'h', 'e', 'r', 'e'>' to 'decltype(nullptr)' (aka 'nullptr_t') without a conversion operator
    decltype(nullptr)("hello there"_xxx);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Demo

Note that the only reason I can now think of to put a string into a template argument is to transfer a string as a constexpr, which could have some interesting reasons for it, such as allowing the morphing of the return type of said constexpr function based on the string passed. Which has some interesting possibilities.

Additional note: It isn't possible to pass a string directly to a constexpr function and have it morph the return type because, as a parameter, it's no longer constexpr, which is a bit annoying. The only way to manipulate a constexpr string and morph the return type is to declare it external to the function as constexpr and then reference that external constexpr variable from within the function, like as shown in my second example.

Edit 2

Turns out that although you can't directly pass something as a constexpr value, you can pass a lambda which will work as a constexpr function.

#include <array>
#include <cstdio>

constexpr size_t str_len(char const * x)
{
    char const * begin = x;
    while (*x) {
        ++x;
    }
    return x - begin;
}

template <size_t I = 0, typename FN>
constexpr auto fn2(FN str) {
    constexpr auto Max = str_len(str());
    if constexpr (I < Max) {
        auto x = fn2<I + 1>(str);
        x[I] = str()[I];
        return x;
    }
    else {
        return std::array<char, Max>{};
    }
}

auto x = fn2<>([]{ return "hello there"; });

int main()
{
    printf("'%*.*s'\n", x.size(), x.size(), x.data());
    return 0;
}

Which results in the same asm output as my first example. Demo

I'm frankly surprised that actually works.

Edit 3

Given that I have figured out how to pass a constexpr string, I can now create a non-recursive type:

#include <utility>

constexpr std::size_t str_len(char const * x)
{
    char const * begin = x;
    while (*x) {
        ++x;
    }
    return x - begin;
}

template <char...> struct c{};

template <typename FN, std::size_t...Is>
constexpr auto string_to_type_impl(FN str, std::index_sequence<Is...>)
{
    return c<str()[Is]...>{};
}

template <typename FN>
constexpr auto string_to_type(FN str)
{
    constexpr auto Max = str_len(str());
    return string_to_type_impl(str, std::make_index_sequence<Max>{});
}

int main()
{
    std::nullptr_t(string_to_type([]{ return "hello there"; }));
    return 0;
}

With the resulting output:

<source>:29:5: error: cannot convert 'c<'h', 'e', 'l', 'l', 'o', ' ', 't', 'h', 'e', 'r', 'e'>' to 'std::nullptr_t' (aka 'nullptr_t') without a conversion operator
    std::nullptr_t(string_to_type([]{ return "hello there"; }));
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Demo

Of course, for these work with c++11, the constexpr functions would have to be converted to recursive ternary versions.

JDługosz
  • 5,592
  • 3
  • 24
  • 45
Adrian
  • 10,246
  • 4
  • 44
  • 110