0

I am using httpwebrequest to download data from given url but few elements not coming in response.

   Dim Request As HttpWebRequest = CType(WebRequest.Create("https://www.royalmail.com/track-your-item#/tracking-results/37005067200003B0F1FF2"), HttpWebRequest)
    Request.Timeout = 2 * 60 * 1000
    Request.Proxy = Nothing
    Request.AutomaticDecompression = DecompressionMethods.Deflate Or DecompressionMethods.GZip
    Request.Credentials = System.Net.CredentialCache.DefaultCredentials
    Dim HttpResp As HttpWebResponse
    HttpResp = (CType(Request.GetResponse(), HttpWebResponse))
    If HttpResp.StatusCode = HttpStatusCode.OK Then
        Dim receiveStream As Stream = HttpResp.GetResponseStream()
        Dim readStream As New StreamReader(receiveStream)
        Dim sData As String
        sData = readStream.ReadToEnd()
        readStream.Close()

    Else

    End If    

While i open the URL ( https://www.royalmail.com/track-your-item#/tracking-results/37005067200003B0F1FF2 ) on chrome and do inspect element then i can see this element (Search for 37005067200003B0F1FF2) but in response i am not getting this element (Search for 37005067200003B0F1FF2).

code using webbrowser control

Private Sub Button10_Click(sender As Object, e As EventArgs) Handles Button10.Click


    ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
    Dim sURL As String = String.Format("https://www.royalmail.com/track-your-item#/tracking-results/37005067200003B0F1FF2")
    Dim webBrowserForPrinting As WebBrowser = New WebBrowser()
    webBrowserForPrinting.ScriptErrorsSuppressed = True
    AddHandler webBrowserForPrinting.DocumentCompleted, AddressOf PrintDocument
    webBrowserForPrinting.Url = New Uri(sURL)
    webBrowserForPrinting.Navigate(sURL)

End Sub
Private Sub PrintDocument(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
    Dim HTMD As HtmlDocument
    HTMD = CType(sender, WebBrowser).Document



    Dim HTC As HtmlElementCollection
    If HTMD IsNot Nothing Then
        HTC = HTMD.All
        For Each ele As HtmlElement In HTC
            MsgBox(ele.InnerHtml)

        Next
    End If

End Sub
  • Dynamic, scripted content. Use a WebBrowser class to access it (it doesn't need to be visible, just navigate to the page and use the `HtmlDocument` tools to retrive the Elements' values you need). – Jimi Aug 21 '19 at 10:22
  • @jimi i already tried this approach but getting the same result. – Flying Kites Aug 21 '19 at 12:52
  • It works pretty well. See here: [POC at Imgur](https://imgur.com/a/NiG4N8X). There's just one problem with that site: it implements the puzzle-captcha (show-you're-not-a-robot style). If you ping it too often, it'll nag you with that. Go figure. Clear the Cookies and try again, if that's the case. – Jimi Aug 21 '19 at 13:37
  • @jimi i have added code using webbrowser control to get the result of element
    i checked your link i dont understand how you have done that. i also clear cookies of my browser.
    – Flying Kites Aug 22 '19 at 08:44
  • The code sample you have posted (in `PrintDocument`) is not correct. This event is raised multiple times, until the Document is fully built. It's completed server-side and dynamically updated. Plus, you must enable your WebBrowser class/control's extended features. In case you haven't already, see here: [How can I get the WebBrowser control to show modern contents?](https://stackoverflow.com/a/38514446/7444103) what you need to do. If you want, I can post the code needed to perform this task. Let me know. – Jimi Aug 22 '19 at 10:00
  • Read these notes, too: [How to get an HtmlElement value inside Frames/IFrames?](https://stackoverflow.com/a/53218064/7444103). – Jimi Aug 22 '19 at 10:01
  • @jimi thanks for your help. but i am not getting success as per your advice. please if you can help with some sample code it will be really thankful. – Flying Kites Aug 22 '19 at 11:56
  • Royal Mail offer a free [Tracking API](https://www.royalmail.com/business/services/sending/business-integration-tools-apis/tracking-api) which would completely avoid the problem. – Andrew Morton Aug 22 '19 at 19:28
  • @AndrewMorton i know about this...my tracking is not trackable , its Delivery confirmation. – Flying Kites Aug 23 '19 at 06:23
  • @FlyingKites Is the listed feature "Receive proof of delivery confirmation (not including signature)" not sufficient for that? – Andrew Morton Aug 23 '19 at 06:50
  • 1
    @AndrewMorton its a feature issue , we have api subscribed with Royal mail but this tracking is not trackable. – Flying Kites Aug 23 '19 at 07:54

1 Answers1

0

You need to activate the WebBrowser advanced feature for the parsing procedure to complete successfully. When these features are not enabled, the WebBrowser, in standard IE7 emulation, won't be able to complete the Document. The failure is caused by the high number of scripting errors.

I've added a class with static methods (WebBrowserAdvancedFetures) to add the required values to the Registry.
WebBrowserAdvancedFetures.ActivateWBAdvancedFeatures is called in the Form's constructor.
You can roll it back calling WebBrowserAdvancedFetures.DeactivateWBAdvancedFeatures.

How does this procedure work:

  1. Instantiate a WebBrowser class (Private browser As WebBrowser). We could also use a WebBrowser control (the visible control version that a Form container can host), it's the same thing.
  2. Subscribe to its DocumentCompleted event. It will be raised each time one of the HtmlDocuments inside the main WebBrowser.Document is completed. Read How to get an HtmlElement value inside Frames/IFrames? for some more details on HtmlDocuments nesting.
  3. In the DocumentCompleted handler, verify that at least one of the Documents is ready to be parsed, checking that WebBrowser.ReadyState = WebBrowserReadyState.Complete
  4. When it is, search for the HtmlElements that contain the data we're looking for.
  5. When all data has been collected, raise an event, to notify that the parsing is completed (this also allows subscribers from other classes to be notified as well, if needed. This requires a custom EventArgs class, though) and disable further parsing of the HtmlDocument (here, this is accomplished setting a Boolean field).
  6. Handle the new data (here, just a String and a DateTime objects), then reset the fields/variables used in the parsing procedure.

Remember to remove the handlers, in the Form.FormClosed event or in a custom class Dispose() method:

RemoveHandler DocumentParsingComplete, AddressOf OnDocumentParsingComplete
RemoveHandler browser.DocumentCompleted, AddressOf browser_DocumentCompleted

Public Event DocumentParsingComplete As EventHandler(Of EventArgs)

Private browser As WebBrowser = Nothing
Private trackingNumberValue As String = String.Empty
Private trackingDateValue As DateTime
Private documentParsed As Boolean = False
Private userAgent As String = "User-Agent: Mozilla/5.0 (Windows NT 10; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0"

Public Sub New()
    InitializeComponent()
    WebBrowserAdvancedFetures.ActivateWBAdvancedFeatures(Path.GetFileName(Application.ExecutablePath))
    browser = New WebBrowser With {.ScriptErrorsSuppressed = True}
    AddHandler DocumentParsingComplete, AddressOf OnDocumentParsingComplete
    AddHandler browser.DocumentCompleted, AddressOf browser_DocumentCompleted
End Sub

Private Sub btnNavigate_Click(sender As Object, e As EventArgs) Handles btnNavigate.Click
    browser.Navigate("")
    browser.Document.OpenNew(True)
    documentParsed = False
    browser.Navigate("[Some URL]", "_self", Nothing, userAgent)
End Sub

Private Sub OnDocumentParsingComplete(sender As Object, e As EventArgs)
    ' Do whatever you need with these
    Console.WriteLine(trackingNumberValue)
    Console.WriteLine(trackingDateValue)

    'Then reset for further use
    trackingNumberValue = String.Empty
    trackingDateValue = DateTime.MinValue
End Sub

Private Sub browser_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs)
    Dim wb As WebBrowser = DirectCast(sender, WebBrowser)
    If wb.ReadyState <> WebBrowserReadyState.Complete OrElse wb.Document.Forms.Count = 0 OrElse documentParsed Then Return

    Dim trackingNumberClass As String = "tracking-number-value"
    Dim trackingElement = wb.Document.GetElementsByTagName("SPAN").
        OfType(Of HtmlElement)().FirstOrDefault(Function(elm) elm.GetAttribute("className").Contains(trackingNumberClass))
    Me.trackingNumberValue = trackingElement?.InnerText

    Dim trackingDateClass As String = "ng-binding ng-scope"
    Dim trackingDateElement = wb.Document.GetElementsByTagName("SPAN").
        OfType(Of HtmlElement)().FirstOrDefault(Function(elm) elm.GetAttribute("className").Equals(trackingDateClass))

    If trackingDateElement IsNot Nothing Then
        Dim deliveryDate As String = trackingDateElement.InnerText.Split().Last().TrimEnd("."c)
        Me.trackingDateValue = Date.ParseExact(deliveryDate, "dd-MM-yyyy", Nothing)
        If Not String.IsNullOrEmpty(trackingNumberValue) Then
            documentParsed = True
            RaiseEvent DocumentParsingComplete(sender, EventArgs.Empty)
        End If
    End If
End Sub

Use this class to activate/deactivate the WebBrowser control's advanced features:

Imports Microsoft.Win32
Imports System.Security.AccessControl

Public Class WebBrowserAdvancedFetures
    Private Shared baseKeyName As String = "Software\Microsoft\Internet Explorer\Main\FeatureControl"
    Private Shared featuresKey As String = baseKeyName & "\FEATURE_BROWSER_EMULATION"
    Private Shared hardwareAccelKey As String = baseKeyName & "\FEATURE_GPU_RENDERING"

    Public Shared Sub ActivateWBAdvancedFeatures(executableName As String)
        Dim wbFeatureKey As RegistryKey = Nothing
        Dim wbAccelKey As RegistryKey = Nothing

        Try
            wbFeatureKey = Registry.CurrentUser.OpenSubKey(featuresKey, 
                RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
            If wbFeatureKey Is Nothing Then
                wbFeatureKey = Registry.CurrentUser.CreateSubKey(featuresKey, True)
            End If
            wbFeatureKey.SetValue(executableName, 11001, RegistryValueKind.DWord)

            wbAccelKey = Registry.CurrentUser.OpenSubKey(hardwareAccelKey, 
                RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
            If wbAccelKey Is Nothing Then
                wbAccelKey = Registry.CurrentUser.CreateSubKey(hardwareAccelKey, True)
            End If
            wbAccelKey.SetValue(executableName, 1, RegistryValueKind.DWord)
        Finally
            wbFeatureKey?.Dispose()
            wbAccelKey?.Dispose()
        End Try
    End Sub

    Public Shared Sub DeactivateWBAdvancedFeatures(executableName As String)
        Using wbFeatureKey = Registry.CurrentUser.OpenSubKey(
            featuresKey, RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
            wbFeatureKey.DeleteValue(executableName, False)
        End Using

        Using wbAccelKey = Registry.CurrentUser.OpenSubKey(
            hardwareAccelKey, RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
            wbAccelKey.DeleteValue(executableName, False)
        End Using
    End Sub
End Class
Jimi
  • 29,621
  • 8
  • 43
  • 61
  • Thanks for sharing your code , i am able to get the result but these is captcha issue, while i am opening the url in webbrowser like IE, chrome its showing the Data but on windows form webbrowser control its asking for captcha to proceed. can we handle this this otherwise its of no use. – Flying Kites Aug 23 '19 at 07:52
  • Have you used the code exactly the way it's shown here? This is a WebBrowser class, a head-less browser. How could you see a captcha, since there's no interface? While testing, the captcha never got in the way. – Jimi Aug 23 '19 at 10:00
  • I was doing some mistake , now i can get the values but its result is not fixed, sometime its giving result but sometime its not giving result (element always came nothing) – Flying Kites Aug 23 '19 at 12:17
  • Did you use the code as it is here or not? How can you *make mistakes* if you just copy/paste it? This code is complete, there's nothing that needs to be modified. I tested it a number of times, it never got back empty results, which is also not possible, since the event is raised only when both values are parsed. – Jimi Aug 23 '19 at 12:52
  • Actually i want it to be automated i don't need Navigate button click event to start the process , so i started with making a DLL kind of thing in that case there was some issue but now ..i have a windows form, on load event i automatically call the navigate event of browser to get the result because i have many numbers of tracking to be tracked. – Flying Kites Aug 24 '19 at 04:52
  • If you like i can share my code (project) with you, same tracking number once giving result but some time later there is no result (element is nothing), Please let me know where and how can i share my code and explain my code, will also give you some sample tracking numbers. – Flying Kites Aug 24 '19 at 07:02
  • do you know how can i clear website cookies from mozilla using VB.net code? – Flying Kites Aug 28 '19 at 11:01