23

In C, is there a good way to define length first, Pascal-style strings as constants, so they can be placed in ROM? (I'm working with a small embedded system with a non-GCC ANSI C compiler).

A C-string is 0 terminated, eg. {'f','o','o',0}.

A Pascal-string has the length in the first byte, eg. {3,'f','o','o'}.

I can declare a C-string to be placed in ROM with:

const char *s = "foo";

For a Pascal-string, I could manually specify the length:

const char s[] = {3, 'f', 'o', 'o'};

But, this is awkward. Is there a better way? Perhaps in the preprocessor?

Joby Taffey
  • 1,129
  • 1
  • 11
  • 17

10 Answers10

21

I think the following is a good solution, but don't forget to enable packed structs:

#include <stdio.h>

#define DEFINE_PSTRING(var,str) const struct {unsigned char len; char content[sizeof(str)];} (var) = {sizeof(str)-1, (str)}

DEFINE_PSTRING(x, "foo");
/*  Expands to following:
    const struct {unsigned char len; char content[sizeof("foo")];} x = {sizeof("foo")-1, "foo"};
*/

int main(void)
{
    printf("%d %s\n", x.len, x.content);
    return 0;
}

One catch is, it adds an extra NUL byte after your string, but it can be desirable because then you can use it as a normal c string too. You also need to cast it to whatever type your external library is expecting.

JM0
  • 340
  • 1
  • 13
cyco130
  • 4,654
  • 25
  • 34
  • 1
    Isn't the point of using pascal strings over C strings to be able to use null bytes anywhere in the string? Reading it as a C-string kinda defeats the purpose? – Filip Haglund Feb 06 '14 at 10:20
  • 4
    @FilipHaglund Yes, it's one reason to use pascal strings. But another reason is to tell a string's length without having to scan the whole. Also, many C-based APIs (e.g. Windows, POSIX...) only accept NUL-terminated strings anyway. That's why Delphi, FreePascal and similar place a null byte to the end of their pascal strings. And the underlying (C-based) system disregards the rest of the string if it has embedded NULs, a potential pitfall. Java tries to circumvent this by encoding NULs with a 2-byte sequence in UTF-8 to fool 8-bit C string APIs. But proper UTF-8 decoders will reject it. – cyco130 Mar 15 '14 at 08:09
20

GCC and clang (and possibly others) accept the -fpascal-strings option which allows you to declare pascal-style string literals by having the first thing that appears in the string be a \p, e.g. "\pfoo". Not exactly portable, but certainly nicer than funky macros or the runtime construction of them.

See here for more info.

hairlessbear
  • 341
  • 3
  • 11
sjrct
  • 301
  • 2
  • 3
  • While that Stanford website does indeed list that option, this is likely from a fork. Their website's page for the C Extensions section over at https://web.archive.org/web/20151124183123/http://fizz.phys.dal.ca/~jordan/gcc-4.0.1/gcc/C-Extensions.html has that Pascal Strings extension listed in there, but the corresponding page (https://gcc.gnu.org/onlinedocs/gcc-4.0.1/gcc/C-Extensions.html) from the official GCC docs for 4.0.1 doesn't have it. Also notable is the complete absence of this extension anywhere in official GCC source code for any version at all. – Gabriel Ravier Mar 15 '22 at 04:07
6

You can still use a const char * literal and an escape sequence as its first character that indicates the length:

const char *pascal_string = "\x03foo";

It will still be null-terminated, but that probably doesn't matter.

Blagovest Buyukliev
  • 42,498
  • 14
  • 94
  • 130
  • 2
    Yes, this will work. But, I still need to manually keep the \x03 in sync with the number of bytes, which is prone to error – Joby Taffey Oct 04 '11 at 14:02
  • @Joby: then you either have to resort to some very nasty preprocessor tricks to stringize the length and paste it with the next token, or incur runtime overhead as brandizzi pointed out. – Blagovest Buyukliev Oct 04 '11 at 14:48
  • 1
    @Joby Taffey: You could write a script to generate a `.c` file containing this style of string from a file of source strings, to be run during the build process. – caf Oct 05 '11 at 04:09
  • 1
    This is a bad example, since `"\x03foo"` is actually interpreted as `"?oo"` since the hex escape doesn't terminate at two digits. Better to use `"\003foo"`. – Parakleta Jun 14 '17 at 04:55
  • 1
    Actually, you could also use the automatic string literal concatenation behaviour to use `"\x03" "foo"` to prevent consumption of additional characters, if a hexadecimal literal is preferred to octal. – Parakleta Jun 14 '17 at 07:16
5

It may sound a little extreme but if you have many strings of this kind that need frequent updating you may consider writing your own small tool (a perl script maybe?) that runs on the host system, parses an input file with a custom format that you can design to your own taste and outputs a .c file. You can integrate it to your makefile or whatever and live happily ever after :)

I'm talking about a program that will convert this input (or another syntax that you prefer):

s = "foo";
x = "My string";

To this output, which is a .c file:

const char s[] = {3, 'f', 'o', 'o'};
const char x[] = {9, 'M', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g'};
cyco130
  • 4,654
  • 25
  • 34
4

My approach would be to create functions for dealing with Pascal strings:

void cstr2pstr(const char *cstr, char *pstr) {
    int i;
    for (i = 0; cstr[i]; i++) {
        pstr[i+1] = cstr[i];
    }
    pstr[0] = i;
}

void pstr2cstr(const char *pstr, char *cstr) {
    int i;
    for (i = 0; i < pstr[0]; i++) {
        cstr[i] = pstr[i+1];
    }
    cstr[i] = 0;
}

Then I could use it this way:

int main(int arg, char *argv[]) {
    char cstr[] = "ABCD", pstr[5], back[5];
    cstr2pstr(cstr, pstr);
    pstr2cstr(pstr, back);
    printf("%s\n", back);
    return 0;
}

This seems to be simple, straightforward, less error prone and not specially awkward. It may be not the solution to your problem, but I would recommend you to at least think about using it.

brandizzi
  • 26,083
  • 8
  • 103
  • 158
  • 1
    For the general case this is definitely the best solution. But, I'm working on a microcontroller where RAM is precious. So, I'd like to pass my Pascal-strings direct from ROM to functions. – Joby Taffey Oct 04 '11 at 14:16
  • 1
    @JobyTaffey I understand your point. I would recommend you to not be too concerned about memory usage but it is because I am a lame programmer that does not do embedding stuff, so ignore me :) Now, a serious question: is your use of Pascal strings imposed by some external library or are you implementing it by yourself for efficiency reasons? The second case can give us some more freedom to think about solutions. – brandizzi Oct 04 '11 at 14:44
  • 1
    It's imposed by an external library – Joby Taffey Oct 04 '11 at 14:46
3

You can apply sizeof to string literals as well. This allows a little less awkward

const char s[] = {sizeof "foo" - 1u, 'f', 'o', 'o'};

Note that the sizeof a string literal includes the terminating NUL character, which is why you have to subtract 1. But still, it's a lot of typing and obfuscated :-)

Jens
  • 69,818
  • 15
  • 125
  • 179
2

This is why flexible array members were introduced in C99 (and to avoid the use of the "struct hack"); IIRC, Pascal-strings were limited to a maximal length of 255.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>  // For CHAR_BIT

struct pstring {
    unsigned char len;
    char dat[];
};

struct pstring* pstring_new(char* src, size_t len)
{
    if (!len) {
        len = strlen(src);
    }

    /* if the size does not fit in the ->len field: just truncate ... */
    if (len >= (1u << (CHAR_BIT * sizeof this->len))) {
        len = (1u << (CHAR_BIT * sizeof this->len))-1;
    }

    struct pstring* this = malloc(sizeof *this + len);
    if (!this) {
        return NULL;
    }

    this->len = len;
    memcpy(this->dat, src, len);
    return this;
}

int main(void)
{
    struct pstring* pp = pstring_new("Hello, world!", 0);

    printf("%p:[%u], %*.*s\n", (void*)pp,
           (unsigned int)pp->len,
           (unsigned int)pp->len,
           (unsigned int)pp->len,
           pp->dat);

    return 0;
}
Jorengarenar
  • 2,705
  • 5
  • 23
  • 60
joop
  • 4,330
  • 1
  • 15
  • 26
2

One option might be to abuse the preprocessor. By declaring a struct of the right size and populating it on initialization, it can be const.

#define DECLARE_PSTR(id,X) \
    struct pstr_##id { char len; char data[sizeof(X)]; }; \
    static const struct pstr_##id id = {sizeof(X)-1, X};

#define GET_PSTR(id) (const char *)&(id)

#pragma pack(push)
#pragma pack(1) 
DECLARE_PSTR(bob, "foo");
#pragma pack(pop)

int main(int argc, char *argv[])
{
    const char *s = GET_PSTR(bob);
    int len;

    len = *s++;
    printf("len=%d\n", len);
    while(len--)
        putchar(*s++);
    return 0;
} 
Joby Taffey
  • 1,129
  • 1
  • 11
  • 17
1

Here's my answer, complete with an append operation that uses alloca() for automatic storage.

#include <stdio.h>
#include <string.h>
#include <alloca.h>

struct pstr {
  unsigned length;
  char *cstr;
};

#define PSTR(x) ((struct pstr){sizeof x - 1, x})

struct pstr pstr_append (struct pstr out,
             const struct pstr a,
             const struct pstr b)
{
  memcpy(out.cstr, a.cstr, a.length); 
  memcpy(out.cstr + a.length, b.cstr, b.length + 1); 
  out.length = a.length + b.length;
  return out;
}

#define PSTR_APPEND(a,b) \
  pstr_append((struct pstr){0, alloca(a.length + b.length + 1)}, a, b)

int main()
{
  struct pstr a = PSTR("Hello, Pascal!");
  struct pstr b = PSTR("I didn't C you there.");

  struct pstr result = PSTR_APPEND(PSTR_APPEND(a, PSTR(" ")), b);

  printf("\"%s\" is %d chars long.\n", result.cstr, result.length);
  return 0;
} 

You could accomplish the same thing using c strings and strlen. Because both alloca and strlen prefer short strings I think that would make more sense.

Samuel Danielson
  • 5,231
  • 3
  • 35
  • 37
1

You can define an array in the way you like, but note that this syntax is not adequate:

const char *s = {3, 'f', 'o', 'o'};

You need an array instead of a pointer:

const char s[] = {3, 'f', 'o', 'o'};

Note that a char will only store numbers up to 255 (considering it's not signed) and this will be your maximum string length.

Don't expect this to work where other strings would, however. A C string is expected to terminate with a null character not only by the compiler, but by everything else.

sidyll
  • 57,726
  • 14
  • 108
  • 151
  • Thanks, updated the declaration in the question. I am aware that these arrays aren't valid C strings and won't work with std library C functions – Joby Taffey Oct 04 '11 at 14:02