Can a compilation error be forced if a string argument is not a string literal?

Question

Let's say I have these two overloads:

void Log(const wchar_t* message)
{
    // Do something
}

void Log(const std::wstring& message)
{
    // Do something
}

Can I then in the first function add some compile-time verifiction that the passed argument is a string literal?

EDIT: A clarification on why this would be good in my case; my current high-frequency logging uses only string literals and can hence be optimized a lot when there are non-heap allocation guarantees. The second overload doesn't exist today, but I might want to add it, but then I want to keep the first one for extreme scenarios. :)

Given that you can always pass `wstr.c_str()` to invoke `Log(const wchar_t*)`, I *think* I understand. You're saying you *only* want L"data" allowed to go through `Log(const wchar_t*)` and *nothing* else? I'm not sure thats possible. Interesting question. — WhozCraig, Sep 01 '13 at 22:34
Unless you were very familiar with your target architecture and compiler, and could add a static assert that the pointer exists in the data section of the generated assembler, I don't think you can do this. Fundamentally there is no difference between a string literal and a variable of the right type, so there's nothing for the compiler to use to enforce that restriction. Out of curiosity, what benefit does that behavior have? — Chris Hayes, Sep 01 '13 at 22:38
Whats the motivation behind such assert? I mean, what kind of issues are you trying to catch? That might help in finding a solution. — Janick Bernet, Sep 01 '13 at 22:42
I don't think there's a situation that benefits from constraining the call to string literals. Still, it would be interesting if someone finds a standard way to achieve this — a.lasram, Sep 01 '13 at 22:51
GCC and Clang (and probably other compilers too) warn about `printf` format strings not being string literals; you could take advantage of that behaviour and extend it possibly. — Carl Norum, Sep 02 '13 at 00:04
@CarlNorum: gcc and clang do this only if you specify `-Wformat-nonliteral`. — Keith Thompson, Sep 02 '13 at 01:04
I didn't have to specify any flags to make it work (see example below). Must be on by default. — Carl Norum, Sep 02 '13 at 01:21
@a.lasram: For logging, to make sure no dynamic memory is used for strings, so that there are concurrency, lifetime or perf issues related to it. — Johann Gerell, Jan 28 '15 at 11:14

score 14 · Accepted Answer · edited Jan 18 '21 at 12:34

So this grew out of Keith Thompson's answer... As far as I know, you can't restrict string literals to only normal functions, but you can do it to macro functions (through a trick).

#include <iostream>
#define LOG(arg) Log(L"" arg)

void Log(const wchar_t *message) {
    std::wcout << "Log: " << message << "\n";
}

int main() {
    const wchar_t *s = L"Not this message";
    LOG(L"hello world");  // works
    LOG(s);               // terrible looking compiler error
}

Basically, a compiler will convert "abc" "def" to look exactly like "abcdef". And likewise, it will convert "" "abc" to "abc". You can use this to your benefit in this case.

I also saw this comment on the C++ Lounge, and that gave me another idea of how to do this, which gives a cleaner error message:

#define LOG(arg) do { static_assert(true, arg); Log(arg); } while (false)

Here, we use the fact that static_assert requires a string literal as it's second argument. The error that we get if we pass a variable instead is quite nice as well:

foo.cc:12:9: error: expected string literal
    LOG(s);
        ^
foo.cc:3:43: note: expanded from macro 'LOG'
#define LOG(arg) do { static_assert(true, arg); Log(arg); } while (false)

@Bill - is there a similar mechanism for C++ 03? `static_assert` appears to be a C++ 11 feature. — jww, Apr 13 '15 at 00:11

score 8 · Answer 2 · answered Sep 01 '13 at 22:52

I believe the answer to your question is no -- but here's a way to do something similar.

Define a macro, and use the # "stringification" operator to guarantee that only a string literal will be passed to the function (unless somebody bypasses the macro and calls the function directly). For example:

#include <iostream>

#define LOG(arg) Log(#arg)

void Log(const char *message) {
    std::cout << "Log: " << message << "\n";
}

int main() {
    const char *s = "Not this message";
    LOG("hello world");
    LOG(hello world);
    LOG(s);
}

The output is:

Log: "hello world"
Log: hello world
Log: s

The attempt to pass s to LOG() did not trigger a compile-time diagnostic, but it didn't pass that pointer to the Log function.

There are at least two disadvantages to this approach.

One is that it's easily bypassed; you may be able to avoid that by searching the source code for references to the actual function name.

The other is that stringifying a string literal doesn't just give you the same string literal; the stringified version of "hello, world" is "\"hello, world\"". I suppose your Log function could strip out any " characters in the passed string. You may also want to handle backslash escapes; for example, "\n" (a 1-character string containing a newline) is stringified as "\\n" (a 2-character string containing a backslash and the letter n).

But I think a better approach is not to rely on the compiler to diagnose calls with arguments other than string literals. Just use some other tool to scan the source code for calls to your Log function and report any calls where the first argument isn't a string literal. If you can enforce a particular layout for the calls (for example, the tokens Log, (, and a string literal on the same line), that shouldn't be too difficult.

You could also do: `#define LOG(arg) Log("" arg)`, which would force `arg` to be a quoted string and just fail otherwise (with a terrible error message). — Bill Lynch, Sep 01 '13 at 23:16
@sharth I'm not sure you realize how wonderfully intuitive that comment is. That is an *outstanding* idea. Casting only a raindrop on that parade, `L""` should be the constant string (the OP wants `wchar_t*` restriction). Great concept. — WhozCraig, Sep 02 '13 at 00:25
@KeithThompson: I just did. I do want to say that this only came to my mind because of your answer. Thanks! — Bill Lynch, Sep 02 '13 at 03:52
*"But I think a better approach is not to rely on the compiler to diagnose calls with arguments other than string literals."* - I don't think that's always practical. For example, if you rely on the user to RTFM, then there will probably be a user who does not do it and violates the assumption/precondition. (In my case, a value is passing from a makefile into the sources via a preprocessor macro). — jww, Apr 13 '15 at 00:18

score 6 · Answer 3 · answered Sep 01 '13 at 22:48

6

You can't detect string literals directly but you can detect if the argument is an array of characters which is pretty close. However, you can't do it from the inside, you need to do it from the outside:

template <std::size_t Size>
void Log(wchar_t const (&message)[Size]) {
    // the message is probably a string literal
    Log(static_cast<wchar_t const*>(message);
}

The above function will take care of wide string literals and arrays of wide characters:

Log(L"literal as demanded");
wchar_t non_literal[] = { "this is not a literal" };
Log(non_literal); // will still call the array version

Note that the information about the string being a literal isn't as useful as one might hope for. I frequently think that the information could be used to avoid computing the string length but, unfortunately, string literals can still embed null characters which messes up static deduction of the string length.

answered Sep 01 '13 at 22:48

Dietmar Kühl

150,225
13
225
380

3

Why would embedding null mess up static detection of the length? The compiler's foreknowledge of array size is always exactly accurate. The real question is why on earth would anyone want to distinguish between a string literal and a non-literal constant-sized array. – Puppy Sep 01 '13 at 22:51
@DeadMG: But it's hard to distinguish `"abc"` from `{ 'a', 'b', 'c', 0 }`. The former should arguably have length 3 and the latter length 4, but they look the same. – Kerrek SB Sep 01 '13 at 22:52
@DeadMG: I would love to avoid `strlen()`, e.g., when sending a string literal to a stream buffer. However, with `"hello, world\n\0plus some junk"` I can't really send the number of characters deduced by the template (minus 1) as it would also write the junk. – Dietmar Kühl Sep 01 '13 at 22:55
@DietmarKühl: _"Note that the information about the string being a literal isn't as useful as one might hope for"_ - Useful in some logging scenarios, to make sure no dynamic memory is used for a log string, so that there are no concurrency, lifetime or perf issues related to it. – Johann Gerell Jan 28 '15 at 11:23

jxh · Answer 4 · 2013-09-02T19:57:21.023

If you define Log as a macro instead, and call separate methods for literal versus std::wstring handling, some variation of the following should work:

#define Log(x) ((0[#x] == 'L' && 1[#x] == '"') ? LogLiteral(x) : LogString(x))

void
LogLiteral (const wchar_t *s) {
    //...do something
}

void
LogString (const std::wstring& s) {
    //...do something
}

The trick is that you need opposing definitions of LogLiteral() so that the compilation will pass, but it should never be called.

inline void LogLiteral (const std::wstring &s) {
    throw std::invalid_argument(__func__);
}

This code gives you the behavior of an overloaded Log() method, in that you can pass either a string literal or a non-string literal to the Log() macro, and it will end up calling either LogLiteral() or LogString(). This gives compile time verification in that the compiler will not pass anything except what the code recognizes as a string literal to the call to LogLiteral(). At sufficient optimizations, the conditional branch can be removed, since every instance of the check is static (on GCC, it is removed).

Lrdx · Answer 5 · 2013-09-02T11:05:11.733

I don't think you can enforce to pass only a string literal to a function, but literals are character arrays, what you can enforce:

#include <iostream>

template<typename T>
void log(T) = delete; //Disable everything

template <std::size_t Size>
void log(const wchar_t (&message)[Size]) //... but const wchar_t arrays
{
    std::cout << "yay" << std::endl;
}

const wchar_t * get_str() { return L"meow"; }

int main() {
    log(L"foo"); //OK

    wchar_t arr[] = { 'b', 'a', 'r', '0' };
    log(arr); //Meh..

//    log(get_str()); //compile error
}

Downside is that if you have a runtime character array, it will work as well, but won't work for the usual runtime c-style strings.

But, if you can work with a slightly different syntax, then the answer is YES:

#include <cstddef>
#include <iostream>

void operator"" _log ( const wchar_t* str, size_t size ) {
  std::cout << "yay" << std::endl;
}

int main() {
  L"Message"_log;
}

Of course, both solution needs a C++11-compatible compiler (example tested with G++ 4.7.3).

Only the #include was missing, but also make sure you try with **C++ 11**, not with **C++ 4.8.1** on Ideone: [link](http://ideone.com/NgMQ8s) — Lrdx, Sep 02 '13 at 11:10

Carl Norum · Answer 6 · 2013-09-02T00:31:44.060

Here's a quick example I just whipped up using the printf hack I suggested in the comments above:

#include <cstdio>

#define LOG_MACRO(x) do { if (0) printf(x); Log(x); } while (0)

void Log(const char *message)
{
    // do something
}

void function(void)
{
    const char *s = "foo";
    LOG_MACRO(s);
    LOG_MACRO("bar");
}

Output from compiling this one with Clang appears to be exactly what you're looking for:

$ clang++ -c -o example.o example.cpp
example.cpp:13:15: warning: format string is not a string literal
      (potentially insecure) [-Wformat-security]
    LOG_MACRO(s);
              ^
example.cpp:3:41: note: expanded from macro 'LOG_MACRO'
#define LOG_MACRO(x) do { if (0) printf(x); Log(x); } while (0)
                                        ^
1 warning generated.

I did have to switch to printf rather than wprintf, since the latter appears not to generate the warning - I guess that's probably a Clang bug, though.

GCC's output is similar:

$ g++ -c -o example.o example.cpp
example.cpp: In function ‘void function()’:
example.cpp:13: warning: format not a string literal and no format arguments
example.cpp:13: warning: format not a string literal and no format arguments

Edit: You can see the Clang bug here. I just added a comment about -Wformat-security.

score 0 · Answer 7 · edited May 23 '17 at 12:06

Adding this alternative for future reference. It comes from the SO question Is it possible to overload a function that can tell a fixed array from a pointer?

#include <iostream>
#include <type_traits>

template<typename T>
std::enable_if_t<std::is_pointer<T>::value>
foo(T)
{
    std::cout << "pointer\n";
}

template<typename T, unsigned sz>
void foo(T(&)[sz])
{
    std::cout << "array\n";
}

int main()
{
  char const* c = nullptr;
  char d[] = "qwerty";
  foo(c);
  foo(d);
  foo("hello");
}

The above snippet compiles and runs fine on http://webcompiler.cloudapp.net/

Can a compilation error be forced if a string argument is not a string literal?

7 Answers7

Linked

Related