Blow is my grammar file.
grammar My;
tokens {
DELIMITER
}
string:SINGLE_QUOTED_TEXT;
SINGLE_QUOTED_TEXT: (
'\'' (.)*? '\''
)+
;
I'm trying to use this to accpet all string(It's part of mysql's g4 actually). Then I use this code to test it:
#include "MyLexer.h"
#include "MyParser.h"
#include <string>
using namespace My;
int main()
{
std::string s = "'中'";
antlr4::ANTLRInputStream input(s);
MyLexer lexer(&input);
antlr4::CommonTokenStream tokens(&lexer);
MyParser parser(&tokens);
parser.string();
return 0;
}
The Chinese character 中's utf8 code is 3 bytes: \xe4 \xb8 \xad
Both grammar file and code file are encoded in utf8. What can I to to let this work fine.