At work, we log into a provider's website that serves as a repository of files. A list of the files appears. Each filename is a link. Click the link, and download the file. It's a very lightweight website.
I'm trying to log in and download the files without the tedious task of clicking each one (there's no "select all" checkbox). I'm using the WebBrowser control on a form with a Go button to begin. Here's the code. Please skip down to the row of asterisks.
Private Sub btnGo_Click(sender As Object, e As EventArgs) Handles btnGo.Click
Try
PageLoaded = False
browser.Navigate("https://[the website]/Account/Login.htm", False)
While Not PageLoaded
Application.DoEvents()
End While
Catch ex As Exception
MsgBox(ex.Message)
End Try
Try
browser.Document.GetElementById("username").InnerText = [username]
browser.Document.GetElementById("password").InnerText = [password]
PageLoaded = False
browser.Document.Forms("mainform").InvokeMember("submit")
While Not PageLoaded
Application.DoEvents()
End While
Catch ex As Exception
MsgBox(ex.Message)
End Try
' ************************************
Dim mycookies As String
mycookies = browser.Document.Cookie
' DEBUG: verified cookies are indeed present
Try
Dim cookieJar As New CookieContainer
Dim cookies As String() = browser.Document.Cookie.Split({"; "}, StringSplitOptions.RemoveEmptyEntries)
Dim cookievaluepairs() = cookies(0).Split("=")
Dim cky As New Cookie(cookievaluepairs(0), cookievaluepairs(1))
cky.Domain = browser.Document.Domain
cookieJar.Add(cky)
Dim cookievaluepairs1() = cookies(1).Split("=")
Dim cky1 As New Cookie(cookievaluepairs(0), cookievaluepairs(1))
cky1.Domain = browser.Document.Domain
cookieJar.Add(cky1)
' DEBUG: verified cookieJar contains expected cookies
Dim wwwclient As New CookieAwareWebClient(cookieJar)
' DEBUG: please see class code below
Dim x As Integer
Dim dlurl As String = ""
Dim inputs As HtmlElementCollection = browser.Document.Links
For Each elm As HtmlElement In inputs
If Microsoft.VisualBasic.Left(elm.OuterHtml, 10) = "<A href=""/" Then
dlurl = elm.GetAttribute("href")
' DEBUG: crappily named dlurl indeed has correct URI
wwwclient.DownloadFile(dlurl, "D:\Desktop\file" & x)
' DEBUG: overriden function GetWebRequest fires
' please see class code below
End If
Next
Catch ex As Exception
MsgBox(ex.Message)
' DEBUG: always lands here with 401 error
End Try
End Sub
Here's one of the many versions of CookieAwareWebClient found here on SO.
Public Class CookieAwareWebClient
Inherits WebClient
Private m_container As CookieContainer = New CookieContainer()
Public Sub New(cc As CookieContainer)
m_container = cc
' DEBUG: verified m_container now has cookieJar passed as cc
End Sub
Protected Overrides Function GetWebRequest(ByVal address As Uri) As WebRequest
Dim request As WebRequest = MyBase.GetWebRequest(address)
Dim webRequest As HttpWebRequest = TryCast(request, HttpWebRequest)
If webRequest IsNot Nothing Then
webRequest.CookieContainer = m_container
End If
Return webRequest
' DEBUG: verified webRequest.CookieContainer is correct
End Function
End Class
I single-step through the code all the way to the wwwclient.DownloadFile
statement, then through the code in the GetWebRequest function, and after a pause, I get a 401 Not Authorized. This has happened with the five or six variations of CookieAwareWebClient I've found.
The two cookies I retrieve from the WebBrowser control after the code successfully logs itself in look like this (different token every time obv).
"samlssologgedout=SSO%20Logged%20Out"
"token=A4AA416E-46C8-11e9-92CD-005056A005E4"
I've verified that those are the same cookies that go into 'webRequest.CookieContainer'. As well, in the WebBrowser control, after log in, you can click on the file's link to download it.
Does anybody see anything obviously wrong in the code?
Still googling while writing the question, I just came across Notes to Inheritors in the MS documentation for WebClient -- "Derived classes should call the base class implementation of WebClient to ensure the derived class works as expected."
That sounds like something you would do in the constructor? Or is this taken care of in the statement MyBase.GetWebRequest(address)
?