36

Is it possible to implement strlen() in the C preprocessor?

Given:

#define MYSTRING "bob"

Is there some preprocessor macro, X, which would let me say:

#define MYSTRING_LEN X(MYSTRING)
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Joby Taffey
  • 1,129
  • 1
  • 11
  • 17
  • Why would you need to do this? Is it not Okay to have the macro expand to strlen("bob")? – Ken Wayne VanderLinde Feb 16 '11 at 21:06
  • What's wrong with `#define MYSTRING_LEN 3`? I can see that it would be useful though, as you would only have to change `MYSTRING` and not worry about `MYSTRING_LEN`... – James Feb 16 '11 at 21:07
  • 11
    I'm working on a tiny embedded system and am critically short on code space. The strlen() function isn't otherwise needed. I'm trying to avoid hardcoding length constants for all of the strings. – Joby Taffey Feb 16 '11 at 21:08
  • Possible duplicate of [Determine #defined string length at compile time](http://stackoverflow.com/questions/4003408). – jww Dec 26 '15 at 02:15
  • Is there another question that talks about how to do this for a constant array of pointers to constant strings like `static const char* strings[] = { "foo", "hello" }`? The handy macro below seems to break-down in this case, e.g. `enum string_names { S_FOO = 0, S_HELLO }; static const size_t const string_lengths[] = { STRLEN(strings[S_FOO]), STRLEN(strings[S_HELLO]) }` Here all `string_lengths` members are 8 (size of the pointer) instead of the length of the string. Maybe I need to just give up and use `strlen` at runtime :/ – jacobq Jun 25 '19 at 12:36

4 Answers4

47

It doesn't use the preprocessor, but sizeof is resolved at compile time. If your string is in an array, you can use that to determine its length at compile time:

static const char string[] = "bob";
#define STRLEN(s) (sizeof(s)/sizeof(s[0]))

Keep in mind the fact that STRLEN above will include the null terminator, unlike strlen().

nmichaels
  • 49,466
  • 12
  • 107
  • 135
  • 7
    Right, so to make it behave as `strlen`, could you not just do `#define STRLEN(s) ( (sizeof(s)/sizeof(s[0])) - sizeof(s[0]) )`? – James Feb 16 '11 at 21:11
  • @James: I think you meant `- 1`. In that case, mostly. – nmichaels Feb 16 '11 at 21:15
  • 1
    @nmichaels: Yes, yes I did. I actually typed `-1` and then went back and (in)corrected it. – James Feb 16 '11 at 21:16
  • I'd be using it with `memcpy()`, so it makes sense to keep it as `strlen()+1`. Also, for the -1 case, using it on an empty string would be bad. – Joby Taffey Feb 16 '11 at 21:17
  • 9
    One issue is that you are really asking for the size of the array, not the C style string. For example `static const char string[] = "bob\0and\0mary"` would report a length greater than `"bob"`. – Edwin Buck Feb 16 '11 at 21:20
  • @Edwin Yeah, I noticed that. Still, that's not such a terrible bug. It also may cause you to copy uninitialized memory in the case where a fixed size array is allocated but has a small string in it. Gotchas, but not deal breakers if your goal is to avoid the `#include `. – nmichaels Feb 16 '11 at 21:24
  • @nmichaels, If you are attempting to avoid the `#include ` then a much better approach is to write your own strnlen in C and then `#include "myString.h"` When programming in C, you gain the ability to count in ways that don't require recursive stack based descent. This solution looks good, but it's a bug waiting to happen for all but the most simple of cases. Basically, you'd be better off with `#define BOB "bob"` and `#define BOB_LEN 3` – Edwin Buck Feb 16 '11 at 21:41
  • 2
    @Edwin: I think `#define BOB_LEN 3` is more likely to end in tears than what I described, but your point is valid. Writing code in C frequently means tempting the bug fates. The decision really comes down to what trade-offs you're willing to make. – nmichaels Feb 16 '11 at 21:46
  • You don't need to put the string into an explicitly named array - `sizeof "bob"` will work fine. – caf Feb 16 '11 at 22:02
  • @caf: Good point. I was just trying to distinguish from `char *string = "bob";`. – nmichaels Feb 16 '11 at 22:03
  • 2
    A common definition in utility header files is `#define NELS(array) (sizeof(array)/sizeof(array[0]))` for "number of elements". This works for any array and is much less misleading than STRLEN. – Jim Balter Feb 17 '11 at 00:27
  • 1
    @nmichaels: There is no such thing as a partially-initialized array in C. If there is any initializer at all, the remainder is zero-filled. This is a fundamental part of the language. – R.. GitHub STOP HELPING ICE Feb 17 '11 at 01:04
10

You can do:

#define MYSTRING sizeof("bob")

That says 4 on my machine, because of the null added to the end.

Of course this only works for a string constant.


Using MSVC 16 (cl.exe -Wall /TC file.c) this:

#include "stdio.h"
#define LEN_CONST(x) sizeof(x)

int main(void)
{
    printf("Size: %d\n", LEN_CONST("Hej mannen"));

    return 0;
}

outputs:

Size: 11

The size of the string plus the NUL character.

Skurmedel
  • 21,515
  • 5
  • 53
  • 66
  • 3
    This isn't an MS extension, it's standard C. A string literal (when not used as an initialiser for an array) is an array of `char`, and `sizeof` works on arrays as with any other type. – caf Feb 16 '11 at 22:07
  • @caf: Good to know, I figured but wasn't sure. – Skurmedel Feb 16 '11 at 22:09
5

Yes: #define MYSTRING_LEN(s) strlen(s)

In most compilers, this will produce a compile-time constant for a constant argument ... and you can't do better than that.

In other words: you dont need a macro, just use strlen; the compiler is smart enough to do the work for you.

Jim Balter
  • 16,163
  • 3
  • 43
  • 66
  • 8
    "Most compilers"? Gcc, yes, MSVC 10, no. – Joseph Quinsey Feb 17 '11 at 06:16
  • 2
    Also the `int f() {char XYZ[strlen(s)]; ...}` is not a valid C89 program while `int f() {char XYZ[sizeof(s)]; ...}` would be. – Maciej Piechotka Oct 02 '13 at 21:50
  • 1
    @MaciejPiechotka The question was about strlen, not sizeof (which has a different value). – Jim Balter Oct 03 '13 at 19:46
  • 2
    @JimBalter: "There are two hard problems in computer science: cache validation, variable substitution, and off-by-one errors." - yes I should add `-1` or `+1` to one of them but my point still stands - even though the compiler is likely optimize it out it does not mean it is applicable in all situations. – Maciej Piechotka Oct 03 '13 at 22:35
  • @MaciejPiechotka Your point is not in dispute but is irrelevant here for the reason noted. And when I said that sizeof has a different value I didn't just mean +1 ... sizeof(p) is not strlen(p)+1 when p is a pointer. – Jim Balter Oct 04 '13 at 07:31
  • 3
    The `sizeof` operator can be used in a **constant expression** (variable length arrays excepted). `strlen` **can't be**. It can be assumed that the use of the preprocessor implies a need for a constant expression, otherwise it wouldn't be done using the preprocessor. So yes, you can do better than `strlen` and use `sizeof("foo")` that yields a constant expression. It can be used in more contexts than an expression the compiler has optimized to a compile-time constant. Such optimizations don't change the meaning of the program, and don't turn the expression into a constant expression. – Kuba hasn't forgotten Monica Sep 21 '17 at 23:35
  • ^ someone who reads neither the question nor the previous comments, and makes completely unsound claims: "It can be assumed that the use of the preprocessor implies a need for a constant expression" is not even remotely true. – Jim Balter Sep 22 '17 at 03:44
1

Generally the C pre-processor doesn't actually transform any data, it only replaces it. This means that you might be able to perform such an operation provided that you pollute your C pre-processor namespace with data implementing (functional) persistent data structures.

That said, you really don't want to do this as the entire "added" functionality will fail spectacularly once you pass in something other than a string. The C pre-processor has no concept of data type, nor does it have the concept of memory de-referencing (useful if you wanted the length of a string stored in a variable). Basically, it would be a fun "see how far you could take it" exercise, but in the end, you would have a MYSTRING_LEN which would only take you a short distance to the goal.

In addition, the C pre-processor's lack of name spaces means that such a macro expansion system would not be containable. One would have to take care to keep the generated names from interfering with other useful macros. In the end, you would probably run out of memory in the pre-processor for any significant use, as the pre-processor isn't really built to hold a name for each character being converted into the "unit" token, and a name for each "unit" token being compressed into its final decimal notation.

nmichaels
  • 49,466
  • 12
  • 107
  • 135
Edwin Buck
  • 69,361
  • 7
  • 100
  • 138
  • Re: "the C pre-processor doesn't actually transform any data, it only replaces it": The C preprocessor is able to evaluate constant expressions in conditional inclusion. Example: in `#if 1+2` the `1+2` is evaluated according to the rules of constant expressions. – pmor Apr 06 '22 at 15:30