7

I have an auto-generated file which looks something like this...

static void do_SomeFunc1(void* parameter)
{
    // Do stuff.
}

// Continues on for another 4000 functions...

void dispatch(int id, void* parameter)
{
    switch(id)
    {
        case ::SomeClass1::id: return do_SomeFunc1(parameter);
        case ::SomeClass2::id: return do_SomeFunc2(parameter);
        // This continues for the next 4000 cases...
    }
}

When I build it like this, the build time is enormous. If I inline all the functions automagically into their respective cases using my script, the build time is cut in half. GCC 4.5.0 says ~50% of the build time is being taken up by "variable tracking" when I use -ftime-report. What does this mean and how can I speed compilation while still maintaining the superior cache locality of pulling out the functions from the switch?

EDIT: Interestingly enough, the build time has exploded only on debug builds, as per the following profiling information of the whole project (which isn't just the file in question, but still a good metric; the file in question takes the most time to build):

  • Debug: 8 minutes 50 seconds
  • Release: 4 minutes, 25 seconds

If you're curious, here are a few sample do_func's, context removed. As you can see, I simplified the problem definition a bit to only show the relevant parts. In case you're wondering, all the self->func calls are calls to boost::signal's.

static void do_Match_Login(Registry* self, const uint8_t* parameters, uint16_t length)
{
    const uint8_t* paramPtr = parameters;

    std::string p0 = extract_string(parameters, &paramPtr, length);
    std::string p1 = extract_string(parameters, &paramPtr, length);
    int32_t p2 = extract_int32(parameters, &paramPtr, length);
    uint32_t p3 = extract_uint32(parameters, &paramPtr, length);
    tuple<Buffer, size_t, size_t> p4 = extract_blob(parameters, &paramPtr, length);

    return self->Match_Login(p0, p1, p2, p3, p4);
}

static void do_Match_ResponseLogin(Registry* self, const uint8_t* parameters, uint16_t length)
{
    const uint8_t* paramPtr = parameters;

    int32_t p0 = extract_int32(parameters, &paramPtr, length);
    std::string p1 = extract_string(parameters, &paramPtr, length);
    array<uint16_t, 3> p2 = extract_vector(parameters, &paramPtr, length);
    std::string p3 = extract_string(parameters, &paramPtr, length);
    uint8_t p4 = extract_uint8(parameters, &paramPtr, length);
    uint8_t p5 = extract_uint8(parameters, &paramPtr, length);
    uint64_t p6 = extract_MUID(parameters, &paramPtr, length);
    bool p7 = extract_bool(parameters, &paramPtr, length);
    tuple<Buffer, size_t, size_t> p8 = extract_blob(parameters, &paramPtr, length);

    return self->Match_ResponseLogin(p0, p1, p2, p3, p4, p5, p6, p7, p8);
}
Clark Gaebel
  • 17,280
  • 20
  • 66
  • 93
  • 7
    Looks like a case for Replace Conditional with Polymorphism... – Billy ONeal Jun 02 '10 at 01:40
  • 1
    Each do_SomeFuncN is literally 1 < x < 6 lines of code. It's not worth it, especially when this file is auto-generated. – Clark Gaebel Jun 02 '10 at 01:41
  • 1
    Since when is a short function an excuse for a difficult to maintain design? It might not have to be auto-generated if you didn't need such an insane switch. – Billy ONeal Jun 02 '10 at 01:42
  • 2
    It's not difficult to maintain. The problem isn't as simple as I posed it here. The script parses a networking protocol definition, and the dispatch function pipes the extracted parameters into the correct boost::signal. Of course, there are multiple parameter types that need to be handled and verified, so it has to do checking there and make sure it doesn't segfault. – Clark Gaebel Jun 02 '10 at 01:43
  • 2
    @wowus: Then move it into it's own file so that it doesn't have to be compiled all the time. – Billy ONeal Jun 02 '10 at 01:44
  • 2
    Dunno if it'll help, but consider an `id->func` lookup table. – Stephen Jun 02 '10 at 01:44
  • @ONeal: I did. That file has a 10 minute build time right now. – Clark Gaebel Jun 02 '10 at 01:45
  • @Stephen: Won't that have a big-ish impact on stack usage? – Clark Gaebel Jun 02 '10 at 01:45
  • I don't see how. Don't create the table on the stack, define it once (static or heap) and use `id` to lookup and get `func` then invoke it. Same number of stack frames as now, much less `switch` branching. – Stephen Jun 02 '10 at 01:49
  • Did I mention IDs aren't necessarily starting at zero and are sparse? No? Damn. Well, yeah. It might as well be random numbers. Good idea though, if I ever have a chance to make a new protocol/modify this existing one, I'll do that for sure. – Clark Gaebel Jun 02 '10 at 01:52
  • @wowus can you show us a couple of `do_someFunc()` s? – wilhelmtell Jun 02 '10 at 02:11
  • I don't see what id being non-consecutive has to do with not being able to put it into a lookup table. That's effectively what the compiler will be doing anyway: generating a jump table from the list of switch cases. – Dean Harding Jun 02 '10 at 02:15
  • The array will have too many holes to make it memory-conscious. According to my calculations, it will take up about 3MB of space - which would overflow the Windows stack. – Clark Gaebel Jun 02 '10 at 02:17
  • @wowus: It's not an array, it'll be a lookup table: `hash_map<>` or something. As I said, that's basically what the compiler will turn your enormous switch into anyway. – Dean Harding Jun 02 '10 at 02:21
  • @codeka: It won't - it turns it into a BST according to my disassembly. – Clark Gaebel Jun 02 '10 at 02:22
  • 1
    @wowus: then `std::map<>` is the equivalent. Anyway, I'm not even sure manually doing it yourself over letting the compiler do it would even help :) – Dean Harding Jun 02 '10 at 02:25
  • @codeka: It's definately not just like an std::map. Just looking at it in IDA makes you understand the huge difference. For example, it's perfectly balanced at compile-time, immutable, and consists of only jumps. Actually, it's really damn cool. I suggest you check it out some time! – Clark Gaebel Jun 02 '10 at 02:28
  • Why are all your void functions returning values? – Michael Jun 02 '10 at 06:59
  • If you submitted this http://gcc.gnu.org/bugzilla/ , there is a good chance it will become faster in the next GCC version. – Laurynas Biveinis Jun 02 '10 at 08:13
  • Also see [GCC/Make Build Time Optimizations](https://stackoverflow.com/q/708807/608639). – jww May 28 '17 at 21:01

3 Answers3

12

You can turn off variable tracking. Variable tracking is used to make the debug information a bit more valuable, but if this code is auto-generated and you're not really going to be debugging it much then it's not really useful. You can just turn it off for that file only.

gcc -fno-var-tracking ...

Should do the trick. As I said, I think you can just do it for that file.

Dean Harding
  • 71,468
  • 13
  • 145
  • 180
  • 1
    Bleh, going to have to dive into CMake docs to figure out how - but that's exactly what I'm looking for; thanks! – Clark Gaebel Jun 02 '10 at 02:21
  • 6
    SET_SOURCE_FILE_PROPERTIES(fileName.cpp COMPILE_FLAGS -fno-var-tracking) – Martin York Jun 02 '10 at 05:41
  • I think it should be SET_SOURCE_FILES_PROPERTIES. At least the extra "S" was needed for me to get CMake to understand. – Jordfräs Jan 18 '18 at 11:57
  • 1
    You also need to add PROPERTIES, to get something like this: `SET_SOURCE_FILES_PROPERTIES(fileName.cpp PROPERTIES COMPILE_FLAGS -fno-var-tracking)` – Jordfräs Jan 18 '18 at 12:20
2

In GNU Make, you can turn off variable tracking for a single target if your compile command uses a flags variable in the arguments like

fileName.o: CXXFLAGS += -fno-var-tracking
NerdMachine
  • 155
  • 1
  • 3
  • Does this work stand alone (as shown above), or do you need the full recipe, too (i.e., `$CXX $CXXFLAGS ... -c $<`)? – jww May 28 '17 at 20:57
  • See Target-specific Variable Values in the Gnu Make Manual. https://www.gnu.org/software/make/manual/make.html#Target_002dspecific – NerdMachine May 30 '17 at 16:12
1

Besides the answers telling how to turn off -fvar-tracking at the CMake level and at the g++-command-line level, you can also turn it off per file, by placing this line at the top of the source file:

#pragma GCC optimize("no-var-tracking")

Then, to suppress the bogus warning that Clang gives on that line, you might want to surround it with #pragma GCC diagnostic ignored, like this:

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunknown-pragmas"
#pragma GCC optimize("no-var-tracking") // to speed up compilation
#pragma GCC diagnostic pop
Quuxplusone
  • 23,928
  • 8
  • 94
  • 159