0

i want to get the links using c# console from a website using html agility pack but there is java script code written in li and href tag why java script changes code on click i don't know please tell me the solution how t get actual code

<li onmouseover="activate_menu('top-menu-61', 61); void(0);" onmouseout="deactivate_menu('top-menu-61', 61);"><a href="javascript:void();

i can just see this in my li and a tag,how to resolve this and get actual html so i can get links furthur

1 Answers1

1

Try using browser automation tools like Selenium WebDriver to generate a webpage fully, utilizing a real browser, before passing it to HtmlAgilityPack for parsing. Using Selenium should be fairly easy as exemplified below. You only need to make sure that all the needed tools (Selenium library and browser driver of choice) are installed properly beforehand :

// Initialize the Chrome Driver (or any other supported browser)
using (var driver = new ChromeDriver())
{
    // open the target page
    driver.Navigate().GoToUrl("the_targt_page_url_here");

    //maybe add selenium waits if needed, 
    //to wait until certain element appear in the page

    //pass the HTML page to HAP's HtmlDocument
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(driver.PageSource);
}

Selenium also provides ways to locate elements within a page, so it is possible to replace HAP completely with Selenium, if you want.

har07
  • 88,338
  • 12
  • 84
  • 137
  • string html; IWebDriver driver = new OpenQA.Selenium.PhantomJS.PhantomJSDriver(); driver.Navigate().GoToUrl(url); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(driver.PageSource); html = driver.PageSource; already tried this code, but still when i see in li and a tag it returns that java script code rather than a proper href – Muhammad Mateen May 05 '16 at 06:39
  • @MuhammadMateen I see, I think misunderstood the question. `href` in this case can be considered *empty* in the sense that it doesn't carry any useful information. The actual action taken upon the link clicked is executing a series of javascript codes, which can be anything (not necessarily redirect to a link). See : http://stackoverflow.com/questions/134845/href-attribute-for-javascript-links-or-javascriptvoid0 – har07 May 05 '16 at 07:30
  • Using selenium (not HAP), you can locate the link using `FindElement()` and then perform click action on it run the javascript... – har07 May 05 '16 at 07:31
  • @MuhammadMateen If I wasn't clear enough, I mean there is no such thing as 'actual HTML' that is rendered by javascript here. `href` contains JS which prevent default link clicked behavior, and probably replace it with `onclick` event handler as mentioned in the linked question above.. – har07 May 05 '16 at 07:41
  • yeah onclick event causing the changing of that link or code, i have used selenium but still not getting the unchanged source like onclick changes it every time or i don't know how to pass selenium to HAP, – Muhammad Mateen May 07 '16 at 17:09
  • i mean i gave an example that it should be a proper href having the link but i am geeting javascript written in my source which i get using HAP, when i see with firebug the href has proper links but when i get source with HAP the code is changed of a tags, please tell me what to do – Muhammad Mateen May 12 '16 at 08:22