0

i have a problem that confused me ! i want to scraping a value from a webpage. i found xpath with chrome and inspector "//*[@id='GlobalTab0Elm']/div[2]/div[1]/div[2]/table/tbody/tr[7]/td[2]/div"

when i use above xpath in webscraper in chrome (extension) it works fine. but the problem is this isnt work on my simple program that you can see this :

     Dim Handler As HtmlAgilityPack.HtmlWeb.PreRequestHandler = Function(request As HttpWebRequest)
                                                                       request.Headers(HttpRequestHeader.AcceptEncoding) = "gzip, deflate"
                                                                       request.AutomaticDecompression = DecompressionMethods.Deflate Or DecompressionMethods.GZip
                                                                       request.CookieContainer = New System.Net.CookieContainer()
                                                                       Return True
                                                                   End Function
        Dim webClient As HtmlWeb = New HtmlWeb()
        webClient.PreRequest = Handler


        webClient.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5"
        webClient.UseCookies = True

        Dim htmlDoc = webClient.Load("http://www.tsetmc.com/Loader.aspx?ParTree=15")
        htmlDoc.OptionReadEncoding = False
        Dim S As String
        S = "//*[@id='GlobalTab0Elm']/div[2]/div[1]/div[2]/table/tbody/tr[7]/td[2]/div"
        Dim node = htmlDoc.DocumentNode.SelectSingleNode(S)
    TextBox1.Text = (node.InnerText)

my qustion is why this xpath works on the other scraper but it didnt work on these code? what i have to do ? The err happen at this line :

TextBox1.Text = (node.InnerText)

and the error text is

Object reference not set to an instance of an object.

when i use this xpath

"//*[@id='company_text']/text()[2]"

it found the correct value of xpath.

my problem is it doesnt work with this xpath.

"//*[@id='GlobalTab0Elm']/div[2]/div[1]/div[2]/table/tbody/tr[7]/td[2]/div"
  • It means that that path doesn't exist. Are you sure you are reading that same source? – Hans Kesting Apr 25 '20 at 09:44
  • Thankyou for help. exactly i used this xpath at my vb.net program and web scraper extension. on web scraper extension it works fine but on my vb.net project it didnt work. – Keyhan Atyeh Mirzaei Apr 25 '20 at 10:42
  • You cannot load that page with WebClient, it's generated dynamically and constantly updated. Use the WebBrowser class (class, not Control) to load and render the page, then pass it to HAP, after the `DocumentCompleted` event is raised. Read the notes here: [How to get an HtmlElement value inside Frames/IFrames?](https://stackoverflow.com/a/53218064/7444103), you'll need it. – Jimi Apr 25 '20 at 10:57
  • Tnx dear Jimi i saw that link. the problem is this i dont know C# :D – Keyhan Atyeh Mirzaei Apr 26 '20 at 05:08
  • is anyone help me, Thanks – Keyhan Atyeh Mirzaei Apr 26 '20 at 12:05
  • 1
    What I posted is not specific to C#, the notes there apply to any .Net language. What's important is that you need a head-less WebBrowser to load and render the Document, since it's generated dynamically (through scripting). That document feature client-side rendering and also server-side (push) updates. Since you probably need a snapshot of the data, you just have to care about what happens client-side. For this, you need a WebBrowser, to execute the scripts. Also, as described in the notes, keep in mind that you can have multiple Documents inside the main Html Page. – Jimi Apr 26 '20 at 22:35

1 Answers1

0

I found the solution

 For Each li As HtmlNode In htmlDoc.DocumentNode.SelectNodes("//tbody")


            ArzeshNode = li.SelectSingleNode("//*[contains(@class,'table1')]/tbody/tr[7]/td[2]/div")   

        Next
msgbox( SplitValue((ArzeshNode.InnerText)) )
  • This is not an answer. Please click the edit link under your question and add this to you question then delete this answer. – Mary Apr 26 '20 at 08:21