QString::split(const QRegularExpression) issue

Question

My app downloads a HTML webpage source code and then try to exctract html lines (tr). My code:

QStringList linesPage1 = page1.split(QRegularExpression("<tr.*>"));

But when I do this:

qDebug() << linesPage1;

I got this:

("<table width=\"1085\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">", "")

When I try this code, he finds 31 occurences:

qDebug() << page1.count(QRegularExpression("<tr.*>"));

I don't understand why he counts 31 occurences but on another hand, he doesn't split the string.

Please note that the part your are splitting will get **removed** from the string! Could you post how the string looks before splitting it? — Felix, Nov 19 '15 at 17:47
The string is too big to be pasted here. But it is a classical html table. — ceriums, Nov 19 '15 at 18:22

score 1 · Accepted Answer · answered Nov 19 '15 at 18:35

The problem is your regular expression. It tries to match a string that starts with <tr end ends with >. And it will look for the longest appeareance of that string. In your case, it will start with the first <tr and go until the end of the document (because HTML ends with a >).

To avoid this, use: <tr[^>]*>. This way it will only match the <tr ...>, because any string except of > is allowed in between.

Try to use webistes like https://regex101.com/#pcre to validate and test your regular expressions!

[Don't parse HTML with regex!](https://stackoverflow.com/a/1732454/399908) Simple counter-example: `hello world` -> first part would be `0) alert(i);">hello world` — Martin Hennings, May 30 '18 at 12:55

QString::split(const QRegularExpression) issue

1 Answers1