25

I want to write some function that takes a string literal - and only a string literal:

template <size_t N>
void foo(const char (&str)[N]);

Unfortunately, that is too expansive and will match any array of char - whether or not it's a true string literal. While it's impossible to tell the difference between these at compile-time - without having to resort to requiring the caller to wrap the literal/array - at run-time, the two arrays will be in entirely different places in memory:

foo("Hello"); // at 0x400f81

const char msg[] = {'1', '2', '3'};
foo(msg); // at 0x7fff3552767f

Is there a way to know where in memory the string data could live so that I could at least assert that the function takes a string literal only? (Using gcc 4.7.3, but really a solution for any compiler would be great).

Barry
  • 286,269
  • 29
  • 621
  • 977
  • 13
    Even if it were possible (which I strongly doubt) I would question the validity of the purpose for which you need to distinguish between these two cases. The requirement sounds rather unusual. – Sergey Kalinichenko Feb 10 '15 at 17:10
  • 5
    It's not unusual at all. String literals have a guaranteed lifetime equal to the duration of the program. That's a very useful trait to be able to detect. – Benjamin Lindley Feb 10 '15 at 17:15
  • well, you can do something like if((int)&msg > memLocation)... where memLocation is an integer representing 0x7fff3552767f or something where you know its between where string literals are located and the rest of the strings – iedoc Feb 10 '15 at 17:15
  • 1
    While I agree with @dasblinkenlight about motives, you can look at the string's address versus the address of a known string literal, the address of something on your stack, and the address of something on your heap. String literals are **usually** stored in a separate memory location closer to the executable code pages instead of on the stack or heap. However, this is compiler dependent. – iwolf Feb 10 '15 at 17:19
  • http://stackoverflow.com/questions/5691232/can-i-determine-if-an-argument-is-string-literal – iedoc Feb 10 '15 at 17:19
  • Not a duplicate of that question at all. I'm not interested in enforcing restrictions on the caller - `foo` itself has to be able to determine this. – Barry Feb 10 '15 at 17:54
  • @dasblinkenlight I need to distinguish between arrays that I can safely store pointers to (string literals) and arrays that I need to copy (non-literals). What's wrong with unusual? This is C++. – Barry Feb 10 '15 at 21:59
  • @Barry I would make an API to make this decision explicitly - say, accept a boolean flag that says "don't make a copy" (or "do make a copy, it does not matter) and default that flag to something. Let programmers decide what to do, rather than attempting to guess their intentions programmatically. – Sergey Kalinichenko Feb 10 '15 at 22:35
  • 1
    @BenjaminLindley any code that tries to detect the lifetime of an object and behave differently is asking for trouble... the lifetime of all objects should either be definitely known, or be irrelevant. – M.M Feb 13 '15 at 06:08
  • 3
    @MattMcNabb: I really don't get what you're trying to say. The lifetime of a string literal is definitely known, and that fact is definitely not irrelevant. And it would definitely be useful to be able to have, for example, an immutable string class which didn't allocate any memory dynamically, and didn't do any copying of the string data, and could be passed around freely with no fear of becoming invalid. You could also take sub-strings of such an object, and they would have the same guarantees. – Benjamin Lindley Feb 13 '15 at 08:36
  • @BenjaminLindley Such a class doesn't need to do any auto-detection or anything. What you're describing is similar to the proposed std::string_view (not sure if that can be used with string literals or not) – M.M Feb 13 '15 at 08:40
  • @MattMcNabb: string_view can be used with string literals. But it cannot make the guarantees I described, because it can also be used with things which aren't string literals. – Benjamin Lindley Feb 13 '15 at 08:42
  • @BenjaminLindley can you give a concrete example? i'm having trouble seeing the problem you are trying to describe – M.M Feb 13 '15 at 08:49
  • 2
    @MattMcNabb: Any function which takes a string as a parameter and needs to store it away (and be sure it doesn't change) could have a more optimized version which takes a string literal. More optimized in that it doesn't allocate any memory and doesn't copy any of the characters of the string. It may be a micro-optimization which is not needed in many cases, but I don't see why the compiler should just throw away information needlessly like that. Zero-overhead principle and all that. – Benjamin Lindley Feb 13 '15 at 09:01

3 Answers3

13

You seem to assume that a necessary trait of a "true string literal" is that the compiler bakes it into the static storage of the executable.

This is not actually true. The C and C++ standards guarantee us that a string literal shall have static storage duration, so it must exist for the life of the program, but if a compiler can arrange this without placing the literal in static storage, it is free to do so, and some compilers sometimes do.

However, it's clear that the property you want to test, for a given string literal, is whether it is in fact in static storage. And since it need not be in static storage, as far as the language standards guarantee, there can't be any solution of your problem founded solely on portable C/C++.

Whether a given string literal is in fact in static storage is the question of whether the address of the string literal lies within one of the address ranges that get assigned to linkage sections that qualify as static storage, in the nomenclature of your particular toolchain, when your program is built by that toolchain.

So the solution I suggest is that you enable your program to know the address ranges of those of its own linkage sections that qualify as static storage, and then it can test whether a given string literal is in static storage by obvious code.

Here is an illustration of this solution for a toy C++ project, prog built with the GNU/Linux x86_64 toolchain (C++98 or better will do, and the approach is only slightly more fiddly for C). In this setting, we link in ELF format, and the linkage sections we will deem static storage are .bss (0-initialized static data), .rodata (read-only static static) and .data (read/write static data).

Here are our source files:

section_bounds.h

#ifndef SECTION_BOUNDS_H
#define SECTION_BOUNDS_H
// Export delimiting values for our `.bss`, `.rodata` and `.data` sections
extern unsigned long const section_bss_start;
extern unsigned long const section_bss_size;
extern unsigned long const section_bss_end;
extern unsigned long const section_rodata_start;
extern unsigned long const section_rodata_size;
extern unsigned long const section_rodata_end;
extern unsigned long const section_data_start;
extern unsigned long const section_data_size;
extern unsigned long const section_data_end;
#endif

section_bounds.cpp

// Assign either placeholder or pre-defined values to 
// the section delimiting globals.
#ifndef BSS_START
#define BSS_START 0x0
#endif
#ifndef BSS_SIZE
#define BSS_SIZE 0xffff
#endif
#ifndef RODATA_START
#define RODATA_START 0x0
#endif
#ifndef RODATA_SIZE
#define RODATA_SIZE 0xffff
#endif
#ifndef DATA_START
#define DATA_START 0x0
#endif
#ifndef DATA_SIZE
#define DATA_SIZE 0xffff
#endif
extern unsigned long const 
    section_bss_start = BSS_START;
extern unsigned long const section_bss_size = BSS_SIZE;
extern unsigned long const 
    section_bss_end = section_bss_start + section_bss_size;
extern unsigned long const 
    section_rodata_start = RODATA_START;
extern unsigned long const 
    section_rodata_size = RODATA_SIZE;
extern unsigned long const 
    section_rodata_end = section_rodata_start + section_rodata_size;
extern unsigned long const 
    section_data_start = DATA_START;
extern unsigned long const 
    section_data_size = DATA_SIZE;
extern unsigned long const 
    section_data_end = section_data_start + section_data_size;

cstr_storage_triage.h

#ifndef CSTR_STORAGE_TRIAGE_H
#define CSTR_STORAGE_TRIAGE_H

// Classify the storage type addressed by `s` and print it on `cout`
extern void cstr_storage_triage(const char *s);

#endif

cstr_storage_triage.cpp

#include "cstr_storage_triage.h"
#include "section_bounds.h"
#include <iostream>

using namespace std;

void cstr_storage_triage(const char *s)
{
    unsigned long addr = (unsigned long)s;
    cout << "When s = " << (void*)s << " -> \"" << s << '\"' << endl;
    if (addr >= section_bss_start && addr < section_bss_end) {
        cout << "then s is in static 0-initialized data\n";
    } else if (addr >= section_rodata_start && addr < section_rodata_end) {
        cout << "then s is in static read-only data\n";     
    } else if (addr >= section_data_start && addr < section_data_end){
        cout << "then s is in static read/write data\n";
    } else {
        cout << "then s is on the stack/heap\n";
    }       
}

main.cpp

// Demonstrate storage classification of various arrays of char 

#include "cstr_storage_triage.h"

static char in_bss[1];
static char const * in_rodata = "In static read-only data";
static char in_rwdata[] = "In static read/write data";  

int main()
{
    char on_stack[] = "On stack";
    cstr_storage_triage(in_bss);
    cstr_storage_triage(in_rodata);
    cstr_storage_triage(in_rwdata);
    cstr_storage_triage(on_stack);
    cstr_storage_triage("Where am I?");
    return 0;
}

Here is our makefile:

.PHONY: all clean

SRCS = main.cpp cstr_storage_triage.cpp section_bounds.cpp 
OBJS = $(SRCS:.cpp=.o)
TARG = prog
MAP_FILE = $(TARG).map

ifdef AGAIN
BSS_BOUNDS := $(shell grep -m 1 '^\.bss ' $(MAP_FILE))
BSS_START := $(word 2,$(BSS_BOUNDS))
BSS_SIZE := $(word 3,$(BSS_BOUNDS))
RODATA_BOUNDS := $(shell grep -m 1 '^\.rodata ' $(MAP_FILE))
RODATA_START := $(word 2,$(RODATA_BOUNDS))
RODATA_SIZE := $(word 3,$(RODATA_BOUNDS))
DATA_BOUNDS := $(shell grep -m 1 '^\.data ' $(MAP_FILE))
DATA_START := $(word 2,$(DATA_BOUNDS))
DATA_SIZE := $(word 3,$(DATA_BOUNDS))
CPPFLAGS += \
    -DBSS_START=$(BSS_START) \
    -DBSS_SIZE=$(BSS_SIZE) \
    -DRODATA_START=$(RODATA_START) \
    -DRODATA_SIZE=$(RODATA_SIZE) \
    -DDATA_START=$(DATA_START) \
    -DDATA_SIZE=$(DATA_SIZE)
endif

all: $(TARG)

clean:
    rm -f $(OBJS) $(MAP_FILE) $(TARG)

ifndef AGAIN
$(MAP_FILE): $(OBJS)
    g++ -o $(TARG) $(CXXFLAGS) -Wl,-Map=$@ $(OBJS) $(LDLIBS)
    touch section_bounds.cpp

$(TARG): $(MAP_FILE)
    $(MAKE) AGAIN=1
else
$(TARG): $(OBJS)
    g++ -o $@ $(CXXFLAGS) $(OBJS) $(LDLIBS)
endif

Here is what make looks like:

$ make
g++    -c -o main.o main.cpp
g++    -c -o cstr_storage_triage.o cstr_storage_triage.cpp
g++    -c -o section_bounds.o section_bounds.cpp
g++ -o prog  -Wl,-Map=prog.map main.o cstr_storage_triage.o section_bounds.o 
touch section_bounds.cpp
make AGAIN=1
make[1]: Entering directory `/home/imk/develop/SO/string_lit_only'
g++  -DBSS_START=0x00000000006020c0 -DBSS_SIZE=0x118 -DRODATA_START=0x0000000000400bf0
 -DRODATA_SIZE=0x120 -DDATA_START=0x0000000000602070 -DDATA_SIZE=0x3a
  -c -o section_bounds.o section_bounds.cpp
g++ -o prog  main.o cstr_storage_triage.o section_bounds.o

And lastly, what prog does:

$ ./prog
When s = 0x6021d1 -> ""
then s is in static 0-initialized data
When s = 0x400bf4 -> "In static read-only data"
then s is in static read-only data
When s = 0x602090 -> "In static read/write data"
then s is in static read/write data
When s = 0x7fffa1b053a0 -> "On stack"
then s is on the stack/heap
When s = 0x400c0d -> "Where am I?"
then s is in static read-only data

If it's obvious how this works, you need read no further.

The program will compile and link even before we know the addresses and sizes of its static storage sections. It would need too, wouldn't it!? In that case, the global section_* variables that ought to hold these values all get built with place-holder values.

When make is run, the recipes:

$(TARG): $(MAP_FILE)
    $(MAKE) AGAIN=1

and

$(MAP_FILE): $(OBJS)
    g++ -o $(TARG) $(CXXFLAGS) -Wl,-Map=$@ $(OBJS) $(LDLIBS)
    touch section_bounds.cpp

are operative, because AGAIN is undefined. They tell make that in order to build prog it must first build the linker map file of prog, as per the second recipe, and then re-timestamp section_bounds.cpp. After that, make is to call itself again, with AGAIN defined = 1.

Excecuting the makefile again, with AGAIN defined, make now finds that it must compute all the variables:

BSS_BOUNDS
BSS_START
BSS_SIZE
RODATA_BOUNDS
RODATA_START
RODATA_SIZE
DATA_BOUNDS
DATA_START
DATA_SIZE

For each static storage section S, it computes S_BOUNDS by grepping the linker map file for the line that reports the address and size of S. From that line, it assigns the 2nd word ( = the section address) to S_START, and the 3rd word ( = the size of the section) to S_SIZE. All the section delimiting values are then appended, via -D options to the CPPFLAGS that will automatically be passed to compilations.

Because AGAIN is defined, the operative recipe for $(TARG) is now the customary:

$(TARG): $(OBJS)
    g++ -o $@ $(CXXFLAGS) $(OBJS) $(LDLIBS)

But we touched section_bounds.cpp in the parent make; so it has to be recompiled, and therefore prog has to be relinked. This time, when section_bounds.cpp is compiled, all the section-delimiting macros:

BSS_START
BSS_SIZE
RODATA_START
RODATA_SIZE
DATA_START
DATA_SIZE

will have pre-defined values and will not assume their place-holder values.

And those predefined values will be correct because the second linkage adds no symbols to the linkage and removes none, and does not alter the size or storage class of any symbol. It just assigns different values to symbols that were present in the first linkage. Consequently, the addresses and sizes of the static storage sections will be unaltered and are now known to your program.

Mike Kinghan
  • 55,740
  • 12
  • 153
  • 182
6

Depending on what exactly you want, this may or may not work for you:

#include <cstdlib>

template <size_t N>
void foo(const char (&str)[N]) {}

template <char> struct check_literal {};

#define foo(arg) foo((check_literal<arg[0]>(),arg))    

int main()
{

    // This compiles
    foo("abc");

    // This does not
    static const char abc[] = "abc";
    foo(abc);
}

This works with g++ and clang++ in -std=c++11 mode only.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
1

You can use user-defined literals, that by definitions can only be applied to literals:

#include <iostream>

struct literal_wrapper
{
    const char* const ptr;
private:
    constexpr literal_wrapper(const char* p) : ptr(p) {}
    friend constexpr literal_wrapper operator "" _lw(const char* p, std::size_t);
};
constexpr literal_wrapper operator "" _lw(const char* p, std::size_t){ return literal_wrapper(p); }

literal_wrapper f()
{
    std::cout << "f()" << std::endl;
    return "test"_lw;
}

void foo(const literal_wrapper& lw)
{
    std::cout << "foo:" << lw.ptr << " " << static_cast<const void*>(lw.ptr) << std::endl;
}

int main()
{
    auto x1 = f(), x2 = f(), x3 = f();
    const void* p1 = x1.ptr;
    const void* p2 = x2.ptr;
    const void* p3 = x3.ptr;
    std::cout << x1.ptr << " " << p1 << " " << p2 << " " << p3 << std::endl;

    foo(x1);
    foo(x2);
    foo("test"_lw);
    foo("test2"_lw);
}
Loghorn
  • 2,729
  • 17
  • 22