0

So I am doing a compiler, and I have part of this LEXER which checks whether some chars form an INTEGER or a REAL number which is satisfied by this EBNF //REAL: DIGIT {DIGIT} . DIGIT {DIGIT} [ (e|E) [(+ | -)] <DIGIT> {DIGIT}


I have this part of the code (snippet of it) which checks that while it is not EOF or token is not match, it continues categorising the Tokens

while (!isEOF() && !tokenMatch)
            {

                //Checking for INTEGERS and REAL
                //INTEGERS: DIGIT {DIGIT}
                if (isDigit(ch))
                {
                    strBuffer += ch;

                    do
                    {
                        ch = nextChar();
                        strBuffer += ch;
                    }
                    while (isDigit(ch));

                    //REAL or ERROR
                    //REAL: DIGIT {DIGIT} . DIGIT {DIGIT} [ (e|E) [(+ | -)] <DIGIT> {DIGIT}

                    if (ch == '.')
                    {
                        do
                        {
                            ch = nextChar();
                            strBuffer += ch;
                        }
                        while (isDigit(ch));

                        //EXPONENT
                        if (ch == 'E' || ch == 'e')
                        {
                            char peek = input -> peek();
                            //CHECK FOR + | -

                            if(peek == '+' || peek == '-')
                            {
                                ch = nextChar();
                                strBuffer += ch;

                                ch = nextChar();
                                if (isDigit(ch))
                                {
                                    do
                                    {
                                        strBuffer +=ch;
                                        ch = nextChar();
                                        cout << strBuffer << endl;
                                    }
                                    while (isDigit(ch));
                                }

The problem lies when I have to load the text file and get the characters from it. IF for example a I write 123.12 WITH a Space, the Lexer will stop at the Whitespace. IF there are NO whitespace at EOF, the last do while loop keeps on repeating forever.


Implementation of Next Char *input is an instream declared as:

ifstream* input  = new ifstream("test.txt");

char nextChar()
        {
            *input >> noskipws >> ch;

            //check for new line. If true, increment Row and start column from 1
            if(ch == '\n')
            {
                row ++;
                col = 1;
            }
            else if (ch == '\t')
            {
                col +=4;
            }
            else
            {
                col++;

            }

            return ch;
        }

Any idea how I can fix this?

thanks

DodoSombrero
  • 767
  • 3
  • 15
  • 29

1 Answers1

1

I would change nextChar to:

int nextChar()
{
   int ch = input->getc();

   if ( ch == EOF )
   {
      return ch;
   }
   //check for new line. If true, increment Row and start column from 1
   else if(ch == '\n')
   {
      row ++;
      col = 1;
   }
   else if (ch == '\t')
   {
      col +=4;
   }
   else
   {
      col++;
   }

   return ch;
}

and make sure that wherever getChar is called, use a variable of type int and compare the returned vale to EOF before proceeding.

R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • What do you mean, compare the value of EOF with getChar()? Do I have to compare with -1 all the time?' – DodoSombrero May 13 '14 at 23:11
  • The tab processing is still wrong. A tab is not equal to 4 spaces and does not move 4 columns. A tab inserts zero or more spaces to the next tab column. If you are on column 2 and the next tab column is 4, there will be 2 spaces added, not 4. – Thomas Matthews May 13 '14 at 23:12
  • At the moment, columns are not being used, I'm more concerned about making the loop stop when it reaches EOF – DodoSombrero May 13 '14 at 23:14
  • @DodoSerebro, If you don't compare the returned value of `nextChar` with `EOF`, you will not know when the end of file has reached and you need to stop. My suggestion is to compare with the symbol `EOF` not `-1`. – R Sahu May 13 '14 at 23:14