5

I would like to peek the next characters of a QTextStream reading a QFile, in order to create an efficient tokenizer.

However, I don't find any satisfying solution to do so.

QFile f("test.txt");
f.open(QIODevice::WriteOnly);
f.write("Hello world\nHello universe\n");
f.close();

f.open(QIODevice::ReadOnly);
QTextStream s(&f);
int i = 0;
while (!s.atEnd()) {
  ++i;
  qDebug() << "Peek" << i << s.device()->peek(3);
  QString v;
  s >> v;
  qDebug() << "Word" << i << v;
}

Gives the following output:

Peek 1 "Hel" # it works only the first time
Word 1 "Hello" 
Peek 2 "" 
Word 2 "world" 
Peek 3 "" 
Word 3 "Hello" 
Peek 4 "" 
Word 4 "universe" 
Peek 5 "" 
Word 5 ""

I tried several implementations, also with QTextStream::pos() and QTextStream::seek(). It works better, but pos() is buggy (returns -1 when the file is too big).

Does anyone have a solution to this recurrent problem? Thank you in advance.

FabienRohrer
  • 1,794
  • 2
  • 15
  • 26
  • add to the logs `s.device()->pos()` and `s.device()->bytesAvailable()` to check progress of device reading. This may help to locate the problem. – Marek R Jan 28 '14 at 13:38
  • Before the first QTextStream::operator>>: (pos = 0, bytesAvailable = 27). Just after it and until the end: (pos = 27, bytesAvailable = 0). The buggy behavior of Qt seems indeed related – FabienRohrer Jan 28 '14 at 14:56
  • I've check [QTextStream code](https://qt.gitorious.org/qt/qtbase/source/ba8342071d05336c7b7f5ff8182a2ba9000c9b53:src/corelib/io/qtextstream.cpp). It looks like it always caches as much data as possible and there is no way to disable this behavior. I was expecting that it will use peek on device, but it only reads in greedy way. Bottom line is that you can't use `QTextStream` in this case. – Marek R Jan 28 '14 at 15:18
  • Yes. I also tried to set the QTextStream and the QFile unbuffered (from the constructor), but it's doesn't help, too. – FabienRohrer Jan 28 '14 at 15:35
  • My opinion is that a function like QTextStream::peek(int size) is missing. – FabienRohrer Jan 28 '14 at 15:37

2 Answers2

2

You peek from QIODevice, but then you read from QTextStream, that's why peek works only once. Try this:

while (!s.atEnd()) {
      ++i;
      qDebug() << "Peek" << i << s.device()->peek(3);
      QByteArray v = s.device()->readLine ();
      qDebug() << "Word" << i << v;
}

Unfortunately, QIODevice does not support reading single words, so you would have to do it yourself with a combination of peak and read.

Anton Poznyakovskiy
  • 2,109
  • 1
  • 20
  • 38
  • Thank you for your interest. The example I wrote is extremely simplified. In my concrete example, I am interesting in using intensively the QTextStream::operator>>. – FabienRohrer Jan 28 '14 at 11:01
  • The issue is indeed that QTextStream::device() is not synchronized with QTextStream. And that there is no peeking function on QTextStream directly. – FabienRohrer Jan 28 '14 at 11:02
  • That's true, but since there is no method for reading a word in QIODevice, and pos() is buggy, my advice would be to implement word-by-word reading for QIODevice yourself and use it instead of QTextStream::operator>>. Or is there a reason why it must be this operator? – Anton Poznyakovskiy Jan 28 '14 at 11:20
  • Yes, you're completely right. In fact, during the discussion, I choose this solution and implemented it :-) Using QTextStream is very convenient to write a simple and good-looking code. It's a pity there is not a peek function in this class, destroying the possibility to write a real tokenizer using this class (where peek is required to parse an ambiguous language) – FabienRohrer Jan 28 '14 at 15:33
  • 1
    @FabienRohrer IIRC, QTextStream either succeeds at reading or leaves the input undisturbed. Thus it should be fine to attempt to, say, read a number, and if it fails, then you can read a word. – Kuba hasn't forgotten Monica Jan 28 '14 at 19:32
0

Try disable QTextStream::autoDetectUnicode. This may read device ahead to perform detection and cause your problem.

Set also a codec just in case.

Add to the logs s.device()->pos() and s.device()->bytesAvailable() to verify that.


I've check QTextStream code. It looks like it always caches as much data as possible and there is no way to disable this behavior. I was expecting that it will use peek on device, but it only reads in greedy way. Bottom line is that you can't use QTextStream and peak device at the same time.
Marek R
  • 32,568
  • 6
  • 55
  • 140
  • I tried all the combinations of QTextStream::setAutoDetectUnicode(bool) with or without QTextStream::setCodec("UTF-8") and the problem remains – FabienRohrer Jan 28 '14 at 15:02