-1

Hi I would like to parse a shop http://www.mercateo.com. Up to now I've used a selenium. It works very well but it's slow. I would like to fine a solution my problem. I found the HtmlUtil and JSoup but I think I have a trouble with clic on links and going to next page.

I wrote a simple example with HtmlUtil:

WebClient web = new WebClient();
HtmlPage page = web.getPage("http://news.yahoo.com/");
web.closeAllWindows();

but I got a lot of warrnings and errors:

WARNING: CSS warning: 'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/presentation/base/master-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications_v2-min.css&kx/ucs/mailcount/css/37/mail_preview-min.css&kx/ucs/search/css/190/search_all-min.css&kx/ucs/search/css/190/search_buttons-min.css&kx/ucs/breakingnews/css/12/breaking_news-min.css&os/mit/media/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer_sponsor-min-188629.css&os/gm/m/footer/footer_links-min-188629.css&os/mit/media/m/trending/trending-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css' [20:3604] Ignoring the following declarations in this rule.
sty 29, 2013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/presentation/base/master-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications_v2-min.css&kx/ucs/mailcount/css/37/mail_preview-min.css&kx/ucs/search/css/190/search_all-min.css&kx/ucs/search/css/190/search_buttons-min.css&kx/ucs/breakingnews/css/12/breaking_news-min.css&os/mit/media/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer_sponsor-min-188629.css&os/gm/m/footer/footer_links-min-188629.css&os/mit/media/m/trending/trending-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css' [20:3996] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
sty 29, 2013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/presentation/base/master-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications_v2-min.css&kx/ucs/mailcount/css/37/mail_preview-min.css&kx/ucs/search/css/190/search_all-min.css&kx/ucs/search/css/190/search_buttons-min.css&kx/ucs/breakingnews/css/12/breaking_news-min.css&os/mit/media/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer_sponsor-min-188629.css&os/gm/m/footer/footer_links-min-188629.css&os/mit/media/m/trending/trending-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css' [20:3996] Ignoring the following declarations in this rule.
sty 29, 2013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error

And I can't find the method which let me click on link (XPath) The JSoup is good for parse web but to dynamically going between page is'n good.

I need your help :) I don't know Can I get the same result with other parser than selenium

Andremoniy
  • 34,031
  • 20
  • 135
  • 241

1 Answers1

0

Visiting a link on a website is not a problem with Jsoup:

Example:

Document doc = Jsoup.connect("http://first.com/").get(); // Connect to 'root' link
Elements links = doc.select("a[href]"); // Select all Links from the website

// As an example connect to the first link of the website and parse it's html:
doc = Jsoup.connect(links.first().absUrl("href")).get();

// Continue with the new website

See also: Using Jsoup, how can I fetch each and every information resides in each link?

Community
  • 1
  • 1
ollo
  • 24,797
  • 14
  • 106
  • 155
  • yes I know but i can't go to next page bu click on button or link on page – Sebastian J. Jan 29 '13 at 13:08
  • I see. Is it possible to add the button parameter to the query string (see: http://stackoverflow.com/questions/7508813/can-jsoup-simulate-a-button-press)? Can you tell me an example of datas you need from the page? – ollo Jan 29 '13 at 13:13