3

Our company's static analysis tool is stating that there are duplicated strings (text literals). The issue is that they are spread across many translation units (source files).

For example, the string "NULL console pointer" exists 1 time in module_a.c, 5 times in module_b.c and 1 time in module_f.c.

We also have coding guidelines that state there is to be no global variables.

We are to prefer to have no variables in header files.

Our platform is an embedded system, so consolidation of constant text will provide room for other purposes (and make the program load faster). In other words, there should only be one instance of the text literal.

So, what is an efficient design or architecture for consolidating constant text literals across multiple translation units?

Is there a length limit where duplication is not worth consolidating (such as the string "\r\n"?

We would prefer solutions that are performance efficient, such as preferring direct access over calling a getter function.

(Note: at this time, the text does not need to be translated into multiple languages.)

Languages: C and C++ (The code base is more C language than C++).
Processor: ARM Cortex A8
Platform: Embedded system, safety and quality and performance critical (medical device).
Compilers: IAR Embedded Workbench (for ARM processor).

Edit 1: Linker Not Consolidating
I scanned the BIN file and it does contain multiple instances of "NULL console pointer".

The linker has the option "Merge duplicate section" and I checked that. The binary still contains duplicates.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • 2
    Most modern linkers get that optimized correctly anyways, I would simply ignore the SCA tools warnings for such case. – πάντα ῥεῖ Jun 22 '16 at 16:20
  • 1
    The answer to this question explains that compilers can do such deduplication automatically: http://stackoverflow.com/questions/11399682/c-optimisation-of-string-literals – Leon Jun 22 '16 at 16:21
  • @Leon: Not a duplicate. The text literals are not in a single file. The issue also contains scoping. – Thomas Matthews Jun 22 '16 at 16:22
  • 2
    @ThomasMatthews I didn't mean that your question was a duplicate. Just wanted to provide some info. – Leon Jun 22 '16 at 16:26
  • 2
    The compiler will do constant folding and get rid of most (possibly all) the duplicates - across compilation units as well if you use LTO. – Jesper Juhl Jun 22 '16 at 16:27
  • @ThomasMatthews gcc can perform string literal merging across translation units (during the linking stage): http://stackoverflow.com/a/26279717/6394138 – Leon Jun 22 '16 at 16:30
  • @Leon: That would not be gcc along, but requires support by the linker. (gcc is not a linker!) – too honest for this site Jun 22 '16 at 16:32
  • Did you inspect the binary. Does it contain duplicates? – 2501 Jun 22 '16 at 16:36
  • @2501: The library contains duplicate text literals. – Thomas Matthews Jun 22 '16 at 16:49
  • 2
    What about putting all the strings in one global extern array, and using an enum type to give meaningful names to the indices? (though it's not quite "no global variables") – Dmitri Jun 22 '16 at 17:07
  • Can you use [message catalogs](https://www.gnu.org/software/libc/manual/html_node/Message-Translation.html)? – Mark Plotnick Jun 22 '16 at 17:36
  • @MarkPlotnick: The IAR compiler is not supporting message catalogs; but the concept seems to be the same as a *getter* function. – Thomas Matthews Jun 22 '16 at 18:17
  • Are you using both C and C++ *compilation* or rather C++ compilation of code that is also valid C? If you use C++ compilation, then the code *is* C++ - (even if it is also valid C). – Clifford Jun 23 '16 at 10:45
  • There is no "more C language than C++". They are different languages. Either you compile a compilation unit as C or as C++. If you mean "C coding-style", it still would be C++ due to different semantics of identical syntax/grammar. – too honest for this site Jun 23 '16 at 22:11

2 Answers2

2

If the concern is only about string literals, you shouldn't need to change your code.

What you need to do is to investigate how to enable string pooling on your compiler. This is an optimizer option which goes looking for identical string literals throughout your program. If it finds that the very same literals are used twice, it will allocate both of them at the very same memory address. This should work across several translation units.

(In GCC this is called -fmerge-constants. I don't know what it is called in IAR.)

Lundin
  • 195,001
  • 40
  • 254
  • 396
0

I think if the linker really does have a problem with de-duplication, I'd use static functions to deliver literal strings:

// header file

struct text
{
  static const char* hello_world();
};


// one source file
// #include "text.hpp"
const char* text::hello_world() {
  static const char _[] = "Hello, World";
  return _;
}



// use case

// #include "text.hpp"
#include <iostream>

int main()
{
  std::cout << text::hello_world() << std::endl;
}
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142