1

I'm trying to make a very basic tokenizer/lexer.

To do this, I'm making a main struct called Token that all types of tokens will inherit from, such as IntToken and PlusToken.

Every new type of token will include a type variable as a string, and a to_string function, which returns a representation like: Token(PLUS) or Token(INT, 5) (5 would be replaced by whatever integer value it is);

I've looked at many questions on SO and it looks like I need to make a vector of type std::shared_ptr(BaseClass) (in my case, BaseClass would be Token) https://stackoverflow.com/a/20127962/12101554

I have tried doing this how I would think that it should be made, but since it didn't work, I looked on SO and found the answer linked above, however it doesn't seem to be working.

Am I following the answer wrong, did I make some other error, or is this not possible to do in C++ without a lot of other code?

(I have also tried converting all the struct's to class's and adding public:, but that makes no change)

#include <iostream>
#include <string>
#include <vector>

struct Token {
    std::string type = "Uninitialized";
    virtual std::string to_string() { return "Not implemented"; };
};

struct IntToken : public Token {
    IntToken(int value) {
        this->value = value;
    }
    std::string type = "INT";
    int value;
    std::string to_string() {
        return "Token(INT, " + std::to_string(value) + ")";
    }
};

struct PlusToken : public Token {
    std::string type = "PLUS";
};

std::vector<std::shared_ptr<Token>> tokenize(std::string input) {
    std::vector<std::shared_ptr<Token>>  tokens;
    for (int i = 0; i < input.length(); i++) {
        char c = input[i];
        if (std::isdigit(c)) {
            std::cout << "Digit" << std::endl;
            IntToken t = IntToken(c - 48);
            std::cout << t.value << std::endl;
            tokens.push_back(std::make_shared<IntToken>(t));
        }
        else if (c == '+') {
            std::cout << "Plus" << std::endl;
            PlusToken p = PlusToken();
            tokens.push_back(std::make_shared<PlusToken>(p));
        }
    }
    return tokens;
}

int main()
{
    std::string input = "5+55";
    std::vector<std::shared_ptr<Token>> tokens = tokenize(input);
    for (int i = 0; i < tokens.size(); i++) {
        //std::cout << tokens[i].to_string() << std::endl;
        std::cout << tokens[i]->type << std::endl;
    }
}

Current Output:

Digit
5
Plus
Digit
5
Digit
5
Uninitialized
Uninitialized
Uninitialized
Uninitialized

Expected Output: (with current code)

Digit
5
Plus
Digit
5
Digit
5
Token(INT, 5)
Token(PLUS)
Token(INT, 5)
Token(INT, 5)

Note: Yes, I know that the proper tokenization would be (5) (+) (55), but I'm still creating the basic part.

Samathingamajig
  • 11,839
  • 3
  • 12
  • 34
  • Make sure your code is compilable... add `#include` too (although this is C++, it might compile on some other compiler) – user202729 Feb 01 '21 at 06:52
  • Besides, try to make a [example] -- while the code is complete, it's certainly far from minimal. You don't need any of the parsing code, just create a shared pointer and read its type is sufficient. – user202729 Feb 01 '21 at 06:55
  • I've added the `#include `, and it doesn't throw any errors nor does it work. I'll get started on making it a MCRE. – Samathingamajig Feb 01 '21 at 06:56
  • The code doesn't seem to be correct (I mean that its not correct for the results you are claiming, it has errors as well). But the issue seems to be that you are expecting access to the `type` member variable to be polymorphic, but in C++ only methods can be polymorphic. – john Feb 01 '21 at 06:58
  • Does this answer your question? [Override initial value of a member variable of base class](https://stackoverflow.com/questions/34592361/override-initial-value-of-a-member-variable-of-base-class) – user202729 Feb 01 '21 at 07:00
  • See also: [1](https://stackoverflow.com/questions/501899/in-inheritance-can-i-override-base-class-data-members) [2](https://stackoverflow.com/questions/19290796/override-member-field-in-derived-classes) – user202729 Feb 01 '21 at 07:00
  • @user202729 unless I'm misreading those links you sent, none of them seemed to be like the issue I was facing – Samathingamajig Feb 01 '21 at 07:24
  • @Samathingamajig The main duplicate link is exactly the same question (in this case the overridden member is `type`) (you should be able to figure it out, no? You already got an answer) – user202729 Feb 01 '21 at 07:35
  • @user202729 reading from the bottom, I can see how you think they're related, but converting my code to be like that would decrease readability. – Samathingamajig Feb 01 '21 at 07:55

1 Answers1

1

You are giving your derived classes their own type member variables. Instead you should be setting the type that belongs to the base class inside the derived-class constructors.

acraig5075
  • 10,588
  • 3
  • 31
  • 50
  • Yeah this fixed it for my specific case with `type` which would exist for every type of token. I'm also able to get the unique values with [C++'s equivalent of instanceof](https://stackoverflow.com/a/500495/12101554) and using `.get()` to get the raw pointer from the `shared_ptr` – Samathingamajig Feb 01 '21 at 07:21