1

Looking at com.ui4j.api.browser.BrowserFactory it appears as if the getBrowser method can only return one instance, as is also documented.

This is quite problematic for anyone trying to write some sort of multi threaded crawler, as there will only exists one browser at all times. There is no way to create a new tab on the browser, so you can only navigate one page at a time which is likely queued up in an list.

Do I really have to resort to having to copy and paste and rewrite the entire BrowserFactory class to get another instance, or is there a way to navigate multiple pages and parse out content concurrently?

Or is this a complete miss? As it stands now, the library is probably only suitable for test purposes. Threaded context is a neccessity in any production system.

Ofcourse I can copy and paste the code but is there another way?

Naftali
  • 144,921
  • 39
  • 244
  • 303
mjs
  • 21,431
  • 31
  • 118
  • 200

1 Answers1

2

BrowserFactory creates a singleton instance of BrowserEngine. BrowserEngine could create more than one page/tab. If you are trying to crawl a site you should review this example. The example create a thread pool with pool size 2. This means that browser could run 2 pages together.

  • Hmm. That example only shows how to create an executor to run them in separate threads. In my tests they are executed in order, when one finishes the next starts. Wait and I will provide an example for you to see. – mjs Jun 17 '15 at 16:00
  • @momo Please submit your code snippet from github: https://github.com/ui4j/ui4j do not hesitate to create an issue. –  Jun 17 '15 at 16:07
  • http://pastebin.com/z0d0Gu5u However, I must say that when I tried to comment out the page.waitUntilDocReady() then it did work. So that method doesn't work well concurrently. – mjs Jun 17 '15 at 16:07
  • Or maybe it does, but on Google the method seems to fail. – mjs Jun 17 '15 at 16:13
  • Do you have any tip on: http://stackoverflow.com/questions/30896488/ui4j-get-the-entire-document-innerhtml ? – mjs Jun 17 '15 at 16:14
  • page.waitUntilDocReady() removed from the api. Version 2.0 do not require to use such method. You should better to checkout and use the master branch from github. https://github.com/ui4j/ui4j/issues/27 –  Jun 17 '15 at 16:15