I have a lexer that consumes a file character by character, looking for tokens. I tried two methods for NextChar()
, the first reads directly from ifstream
through ifstream::get(ch)
, and the second loads the whole file into a std::stringstream
to avoid disk I/O overhead.
get() method:
inline void Scanner::NextChar()
{
inputStream.get(unscannedChar);
currentCol++;
while (unscannedChar == ' ')
{
inputStream.get(unscannedChar);
currentCol++;
}
if (inputStream.eof()) {
unscannedChar = std::char_traits<char>::eof();
}
}
stringstream
method:
while loading the file into stringstream
takes no time, indexing is extremely slow.
inline void Scanner::NextChar()
{
unscannedChar = buffer.str()[counter++];
currentCol++;
while (unscannedChar == ' ')
{
unscannedChar = buffer.str()[counter++];
currentCol++;
}
if (counter > buffer.str().size())
{
unscannedChar = std::char_traits<char>::eof();
}
}
I expected the 2nd method to be much faster, since it's iterating over characters in memory not on disk, but I was wrong, and here are some of my tests:
| tokens | ifstream::get() | stringstream::str()[] |
|-------- |----------------- |----------------------- |
| 5 | 0.001 (sec) | 0.001 (sec) |
| 800 | 0.002 (sec) | 0.295 (sec) |
| 21000 | 0.044 (sec) | 693.403 (sec) |
NextChar()
is extremely important for my project, and I need to make it as fast as possible and I would appreciate explaining why am I having the previous results?