Trying to parse html, preferably within Qt4

Question

Some googlefoo lead me to this answer, however after mucking around with it and reading the docs I can't figure out how to actually build a QWebFrame in order to parse.

I will need to do something a fair bit more elaborate than this later, but right now all I'm trying to do is post some data, loginusername and password, to a website and parse the title tag on the response page to determine whether the login was a success of failure. I feel like it might be quicker to do that with regex rather than building a whole dom, but I don't know regex and this seems easier atm.

So, what I've got going on now is I post the data and the reply gets turned over to a method of a subclassed QDialog when the request emits the finished() signal. So I've got a QNetworkReply which I'm trying to parse and don't know where to go from there. If you need to see my code please ask, but I figured it was unnecessary. Thanks guys.

score 0 · Answer 1 · edited May 23 '17 at 11:47

0

Don't parse HTML with regex!

edited May 23 '17 at 11:47

Community

1
1

answered Feb 21 '11 at 18:16

karlphillip

92,053
36
243
426

I know you normally should not, however my thought is the best way to do this is to start reading the data before it's finished downloading, all I need is the title which could be a one pass regex expression and once it's extracted it it could just abort the request. That's got to be a lot better way than downloading the whole page, parsing it into a domtree and then extracting the title text. Still, the performance difference is hardly noticible, and I don't know regex, so I'm going with the later route anyways, especially since I _will_ need it later on. – kryptobs2000 Feb 21 '11 at 18:20
@kryptobs2000 if you search for .* that could work. But imagine if someone put in a comment something like test, you're screwed. It could also be in a script. So it'll work, until it won't. – anno Feb 22 '11 at 00:05

Trying to parse html, preferably within Qt4

1 Answers1