1

I have created a simple web crawler for a specific task using Qt in a GUI environment. Now, I would like to automatize it (using cron) and its necessary to use a pure non-GUI environment. I've tried to port the code to a non-GUI application without success.

I have some questions:

  1. It's possible to use QWebPage in a pure non-GUI environment? (a Linux terminal). I've read some similar questions and I think that it is not possible, but I still have some doubts.

  2. If it isn't possible, How can I use Qt to program the web crawler for a non-GUI application?. I'm familiarized with Qt (non an expert, of course) and if possible, I want to use it

  3. If it is still not possible, what libraries do you recommend to get and parse HTML pages? (multi-platform and C++, also Python, but it means I've to do again a lot of work)

Edit:

According to this answer, I can run my web crawler on the terminal, but I have to use a fake server. This is not a perfect solution but it allow me to program the task with cron. In future, I will explore python capacities for this task

Community
  • 1
  • 1
Manuel
  • 2,236
  • 2
  • 18
  • 28

1 Answers1

1

Of course it is possible. QWebPage inherits just from QObject, not QWidget. There is even a short tutorial how to do it without GUI directly on the Qt doc page. Aimed just at url crawler, i guess you don't even need the rendering part.

Pavel Zdenek
  • 7,146
  • 1
  • 23
  • 38
  • Thats the way I'm using qWebPage now, but QWebPage needs QWidget. as is pointed in this comment: http://stackoverflow.com/a/9213335/670873, so you can not lunch from a terminal (I'm not talking about a XTerm or Konsole, just a terminal without X-server) – Manuel Sep 21 '12 at 06:40
  • 1
    Absolutely interesting. I think i will investigate it myself out of pure curiosity. The QWidgets that "QWebCore is creating itself" cannot have a parent QWidget set (because you cannot set any to QWebPage), which goes totally against Qt philosophy of ownership tree. In the worst case, you'll have to resort to a proper headless crawler library like [arachne](http://arachne.sourceforge.net/). – Pavel Zdenek Sep 21 '12 at 07:53