0

I want to create an application that will be able to extract data (for separate processing & formatting) from a specific website whilst I'm browsing, and possibly to automatically retrieve some specific pages in the background.

The website itself doesn't matter here, but what it pertinent is that it is behind a login (so requires session/cookie management) and it uses a lot of Javascript (JQuery), and notably a lot of Ajax for live content. One saving grace though is that it doesn't use Flash.

I only need this application to run in a Windows environment, but need to support XP as well as 7. And I don't care which browser it is, as long as it is fully functional!

So, I'm trying to find the easiest way to embed a fully functional browser into my own C++ application. IDE of choice would be Visual Studio 2008/2010, although Borland C++ would also be an option. I'm ideally aiming to use WxWidgets (if I go the VS path) to manage the rest of the GUI, and don't want to program in C#/.net, or QT.

  • I found 'wxWebConnect', which did a lot of what I needed, but it doesn't support cookies/sessions, so that wasn't usable.
  • I tried the Microsoft Web Browser Com component, but that has rendering issues with complex sites.
  • The WebKit path for WX is abandoned.
  • The Borland WebBrowser component is also dated and doesn't render complex pages correctly.
  • And many of the search results I've found here and elsewhere are either out of date, don't deal with fully functional browsers, or use languages/platforms that aren't an option for me.

I'm quite sure there must a simple solution to this as I'm sure many others have already done something similar. I just can't find it at the moment! I've looked briefly at the Firefox/Chromium paths, which is probably the right area, but as yet I haven't been able to find a simple means of integrating/using them into my own WX project.

Community
  • 1
  • 1
Dave
  • 1,696
  • 4
  • 23
  • 47
  • 1
    I think you should try to contact the website in question, and see if they have a simpler API (REST or otherwise) that can be used, so you don't have to actually get a whole complete page with javascript that needs to be executed. Then you can simply use e.g. [libcurl](http://curl.haxx.se/libcurl/) and a JSON/XML parser (depending on what you will receive) to extract the data you need. – Some programmer dude Nov 08 '13 at 08:27
  • This may be relevant: http://stackoverflow.com/questions/18119125/options-for-embedding-chromium-instead-of-ie-webbrowser-control-with-wpf-c. Also, MS `WebBrowser` rendering issues can often be easily resolved: http://stackoverflow.com/a/18802626/1768303 – noseratio Nov 08 '13 at 08:28
  • @JoachimPileborg I already contacted them, and they don't have any API or alternate interface I can use, so I'm having to do this 'the hard way' – Dave Nov 08 '13 at 08:43

1 Answers1

1

In wxWidgets 3.0 you can use wxWebView, which is a wrapper around the native HTML rendering engine with full support for CSS, JavaScript and so on. By default it will use the system IE engine but you can also use it with Chromium.

VZ.
  • 21,740
  • 3
  • 39
  • 42