0

My latest recreational project is to write a brainfuck interpreter in C++. It was straightforward enough but today I decided to add to it by an compilation step. The eventual goal is to be able to create executables but right now all it does is some basic optimization. For instance +++++ is turned for 5 add 1 commands into a single add 5 and so on.

While this works well, I've noticed that the size of my executable after being stripped has gone from 9k to 12k. After some research I've determined the function below is to blame, specifically the map. I do not understand why.

void Brainfuck::compile(const string& input) {
    map<char, pair<Opcode, int>> instructions {
        { '<', make_pair(Opcode::MOVL,  1) },
        { '>', make_pair(Opcode::MOVR,  1) },
        { '+', make_pair(Opcode::INCR,  1) },
        { '-', make_pair(Opcode::DECR,  1) },
        { '[', make_pair(Opcode::WHILE, 0) },
        { ']', make_pair(Opcode::WEND,  0) },
        { ',', make_pair(Opcode::INP,   0) },
        { '.', make_pair(Opcode::OUTP,  0) },
    };

    string::const_iterator c = begin(input);
    while (c != end(input)) {
        if (instructions.find(*c) != end(instructions)) {
            auto instruction = instructions[*c];
            makeOp(c, instruction.first, instruction.second);
        } else {
            c++;
        }
    }
}

The key in the map is one of the 8 valid Brainfuck operations. The function goes through the input string and looks for these valid characters. An invalid character is just skipped as per the Brainfuck spec. If it finds one it passes the map value for that key to a function called makeop that does optimization, turns it into an op struct and adds it to the vector of instructions that my interpreter will actually execute.

The op struct has two members. An Opcode, which is a scoped enum based on a uint8_t representing one of the 8 operations, and one int containing the operand. (one operand for now. A future more sophisticated version might have instructions with multiple operands.) So in the +++++ example above the struct would contain Opcode::INCR and 5.

So the value of each map entry is a std::pair consisting of the Opcode, and the number of operands. I realize that some overhead is inevitable with a generic data structure but given the simplicity of the description above, don't you think 3k is a bit excessive?

Or perhaps this is not the right approach to achieve my aim efficiently? But if not a std::map then which data structure should I use?

Update:

Thanks to all those who responded. First, a few words about my motives. I don't use C++ in my day job that often. So I'm doing some recreational projects to keep my knowledge sharp. Having the absolutely smallest code size is not as important as e.g. clarity but it is interesting to me to learn how bloat and such things happen.

Following the advice given, my function now looks like this:

static const int MAXOPS = 8;

void Brainfuck::compile(const string& input) {
    array<tuple<char, Opcode, int>, MAXOPS> instructions {
        make_tuple('<', Opcode::MOVL,  1),
        make_tuple('>', Opcode::MOVR,  1),
        make_tuple('+', Opcode::INCR,  1),
        make_tuple('-', Opcode::DECR,  1),
        make_tuple('[', Opcode::WHILE, 0),
        make_tuple(']', Opcode::WEND,  0),
        make_tuple(',', Opcode::INP,   0),
        make_tuple('.', Opcode::OUTP,  0),
    };

    string::const_iterator c = begin(input);
    while (c != end(input)) {
        auto instruction = find_if(begin(instructions), end(instructions),
        [&instructions, &c](auto i) {
            return *c == get<0>(i);
        });

        if (instruction != end(instructions)) {
            makeOp(c, get<1>(*instruction), get<2>(*instruction));
        } else {
            c++;
        }
    }
}

I recompiled and...nothing happened. I remembered that I had another method which contained a map and a (now deleted?) response suggested that merely having an instantiation of map would be enough to add code. So I changed that map to an array to and recompiled again. This time the stripped size of my executable went from 12280 to 9152.

Community
  • 1
  • 1
Jaldhar
  • 279
  • 1
  • 10
  • A map uses dynamic allocation for each entry separately. Perhaps a normal lookup table would be better if code-size is the goal. – M.M Oct 27 '16 at 02:59
  • 1
    That 3k has all the code needed to build, search through, and destroy a red-black tree. Doesn't sound too outlandish to me. – T.C. Oct 27 '16 at 03:05
  • Fwiw, 3kb is virtually nothing, unless you're doing this on some embedded platform with heavy storage / memory constraints. – Jason C Oct 27 '16 at 03:24
  • If your map is completely known at compile time, store it in an array, with the keys sorted, then do a binary search in it. – 1201ProgramAlarm Oct 27 '16 at 04:05
  • Better yet, use a switch statement and let the compiler do its job. Most compilers will either convert it to a binary search or a lookup table. – Andrey Turkin Oct 27 '16 at 07:23
  • @AndreyTurkin A switch statement was what I had before I tried the map. I find that with my array implementation, the code is the same size as with the switch implementation. It seems that the compiler is smart enough to optimize this construct. – Jaldhar Oct 27 '16 at 13:53

1 Answers1

3

map internally uses a balanced tree for storing the elements. Balanced trees are fast but require some code overhead for re-balancing the tree on insertions or deletes. So 3k for this code seams reasonably.

Meixner
  • 615
  • 5
  • 8