0

I was wondering if anyone knew an easy way to have .pdf files trigger the readystate when loaded. I'm building a program to open url's and take screenshots, then put them in excel.

The web browser will load html documents correctly, but gets stuck in While Not pageready when loading .pdf files. The browser control correctly renders the .pdf.

Private Sub btngo_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btngo.Click
    Dim file As String
    Dim Obj As New Object
    Dim result As String
    Dim sheet As String = "sheet1"
    Dim xlApp As New Excel.Application

    If lblpath.Text <> "" Then
        file = lblpath.Text
        Dim xlWorkBook = xlApp.Workbooks.Open(file)
        Dim xlWorkSheet = xlWorkBook.Worksheets(sheet)
        Dim range = xlWorkSheet.UsedRange

        ProgressBar1.Value = 0

        For rCnt = 4 To range.Rows.Count
            'url cell
            Obj = CType(range.Cells(rCnt, 2), Excel.Range)
            ' Obj.value now contains the value in the cell.. 
            Try
                ' Creates an HttpWebRequest with the specified URL. 
                Dim myHttpWebRequest As HttpWebRequest = CType(WebRequest.Create(Obj.value), HttpWebRequest)
                ' Sends the request and waits for a response. 
                Dim myHttpWebResponse As HttpWebResponse = CType(myHttpWebRequest.GetResponse(), HttpWebResponse)
                If myHttpWebResponse.StatusCode = HttpStatusCode.OK Then
                    result = myHttpWebResponse.StatusCode
                    WebBrowser1.ScrollBarsEnabled = False
                    WebBrowser1.Navigate(myHttpWebRequest.RequestUri)

                    WaitForPageLoad()

                    CaptureWebBrowser(WebBrowser1)
                End If
                ' Release the resources of the response.
                myHttpWebResponse.Close()

            Catch ex As WebException
                result = (ex.Message)
            Catch ex As Exception
                result = (ex.Message)
            End Try


            RichTextBox1.AppendText(result & "    " & Obj.value & vbNewLine)

            If radpre.Checked = True Then
                range.Cells(rCnt, 3).value = result
            ElseIf radcob.Checked = True Then
                range.Cells(rCnt, 4).value = result
            ElseIf radpost.Checked = True Then
                range.Cells(rCnt, 5).value = result

            End If


            ProgressBar1.Value = rCnt / range.Rows.Count * 100
        Next

        With xlApp
            .DisplayAlerts = False
            xlWorkBook.SaveAs(lblpath.Text.ToString)
            .DisplayAlerts = True
        End With

        xlWorkBook.Close()
        xlApp.Quit()

        'reclaim memory
        Marshal.ReleaseComObject(xlApp)
        xlApp = Nothing
    End If
End Sub

Private Function CaptureWebBrowser(ByVal wb As WebBrowser) As Image
    Try
        Dim hBitmap As Bitmap = New Bitmap(wb.Width, wb.Height)
        wb.DrawToBitmap(hBitmap, wb.Bounds)
        Dim img As Image = hBitmap
        Return img
    Catch ex As Exception
        MessageBox.Show(ex.Message)
    End Try
    Return Nothing
End Function


Private Sub WaitForPageLoad()
    AddHandler WebBrowser1.DocumentCompleted, New WebBrowserDocumentCompletedEventHandler(AddressOf PageWaiter)
    While Not pageready
        Application.DoEvents()
    End While
    pageready = False
End Sub

Private Sub PageWaiter(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
    If WebBrowser1.ReadyState = WebBrowserReadyState.Complete Then
        pageready = True
        RemoveHandler WebBrowser1.DocumentCompleted, New WebBrowserDocumentCompletedEventHandler(AddressOf PageWaiter)
    End If
End Sub

update to resolved


I'm very happy with the feedback. I really like like the answer Noseratio provided. I was not aware using the code pattern as not in best practices. When opening a .pdf or any other document not web based readyState will never change from 0. Seeing how this program is simply a way for me not to work at work, I'm satisfied with only capturing .html and .htm.

My requirements were

  1. open excel document
  2. parse links located in excel document
  3. determine response code
  4. write response code and if possible screenshot to excel

The program parses and retrieves feedback far faster then I would be able to do manually. Screenshots of .html and .htm provide non-technical viewers of the excel file proof of successful migration from production to COB, and back to production environments.

This code as stated by Noseratio does not follow best practices, nor is it high quality. This is a quick and dirty implementation.

Option Infer On
Imports Microsoft.Office.Interop
Imports System.Net
Imports System.Runtime.InteropServices

Public Class Form1


Public Property pageready As Boolean

Private Sub OpenToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles OpenToolStripMenuItem.Click
    OpenFileDialog1.ShowDialog()
End Sub

Private Sub OpenFileDialog1_FileOk(ByVal sender As System.Object, ByVal e As System.ComponentModel.CancelEventArgs) Handles OpenFileDialog1.FileOk
    lblpath.Text = OpenFileDialog1.FileName.ToString
End Sub

Private Sub btngo_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btngo.Click
    Dim file As String
    Dim Obj As New Object
    Dim result As String
    Dim sheet As String = "sheet1"
    Dim xlApp As New Excel.Application
    Dim img As Bitmap
    Dim path As String = "C:\Documents and Settings\user\My Documents\Visual Studio 2010\Projects\COB-HTML-Tool\COB-HTML-Tool\bin\Debug\tmp.bmp"
    If lblpath.Text <> "" Then
        file = lblpath.Text
        Dim xlWorkBook = xlApp.Workbooks.Open(file)
        Dim xlWorkSheet = xlWorkBook.Worksheets(sheet)
        Dim range = xlWorkSheet.UsedRange

        ProgressBar1.Value = 0

        For rCnt = 4 To range.Rows.Count
            'url cell
            Obj = CType(range.Cells(rCnt, 2), Excel.Range)
            ' Obj.value now contains the value in the cell.. 
            Try
                ' Creates an HttpWebRequest with the specified URL. 
                Dim myHttpWebRequest As HttpWebRequest = CType(WebRequest.Create(Obj.value), HttpWebRequest)
                ' Sends the request and waits for a response. 
                Dim myHttpWebResponse As HttpWebResponse = CType(myHttpWebRequest.GetResponse(), HttpWebResponse)
                If myHttpWebResponse.StatusCode = HttpStatusCode.OK Then
                    result = myHttpWebResponse.StatusCode


                    Dim len As Integer = myHttpWebRequest.RequestUri.ToString.Length - 4
                    If myHttpWebRequest.RequestUri.ToString.Substring(len) = ".htm" Or
                        myHttpWebRequest.RequestUri.ToString.Substring(len - 1) = ".html" Or
                        myHttpWebRequest.RequestUri.ToString.Substring(len) = ".asp" Then
                        WebBrowser1.Navigate(myHttpWebRequest.RequestUri)
                        WaitForPageLoad()

                        img = CaptureWebBrowser(WebBrowser1)
                        img.Save(path)
                    End If

                End If
    ' Release the resources of the response.
    myHttpWebResponse.Close()

            Catch ex As WebException
        result = (ex.Message)
    Catch ex As Exception
        result = (ex.Message)
    End Try


            RichTextBox1.AppendText(result & "    " & Obj.value & vbNewLine)

            If radpre.Checked = True Then
                range.Cells(rCnt, 3).value = result

                If img Is Nothing Then
                Else
                    If Dir(path) <> "" Then
                        range.Cells(rCnt, 4).Select()
                        Dim opicture As Object
                        opicture = xlApp.ActiveSheet.Pictures.Insert(path)
                        opicture.ShapeRange.LockAspectRatio = True
                        opicture.ShapeRange.width = 170
                        opicture.ShapeRange.height = 170
                        My.Computer.FileSystem.DeleteFile(path)

                    End If
                End If
            ElseIf radcob.Checked = True Then
                range.Cells(rCnt, 5).value = result
                If img Is Nothing Then
                Else
                    If Dir(path) <> "" Then
                        range.Cells(rCnt, 6).Select()
                        Dim opicture As Object
                        opicture = xlApp.ActiveSheet.Pictures.Insert(path)
                        opicture.ShapeRange.LockAspectRatio = True
                        opicture.ShapeRange.width = 170
                        opicture.ShapeRange.height = 170
                        My.Computer.FileSystem.DeleteFile(path)
                    End If
                End If
            ElseIf radpost.Checked = True Then
                range.Cells(rCnt, 7).value = result
                If img Is Nothing Then
                Else
                    If Dir(path) <> "" Then
                        range.Cells(rCnt, 8).Select()
                        Dim opicture As Object
                        opicture = xlApp.ActiveSheet.Pictures.Insert(path)
                        opicture.ShapeRange.LockAspectRatio = True
                        opicture.ShapeRange.width = 170
                        opicture.ShapeRange.height = 170
                        My.Computer.FileSystem.DeleteFile(path)
                    End If
                End If
            End If


            ProgressBar1.Value = rCnt / range.Rows.Count * 100
        Next

        With xlApp
            .DisplayAlerts = False
            xlWorkBook.SaveAs(lblpath.Text.ToString)
            .DisplayAlerts = True
        End With

        xlWorkBook.Close()
        xlApp.Quit()

        'reclaim memory
        Marshal.ReleaseComObject(xlApp)
        xlApp = Nothing
    End If
End Sub
Private Function CaptureWebBrowser(ByVal wb As WebBrowser) As Image

    Try
        wb.ScrollBarsEnabled = False
        Dim hBitmap As Bitmap = New Bitmap(wb.Width, wb.Height)
        wb.DrawToBitmap(hBitmap, wb.Bounds)
        Dim img As Image = hBitmap
        Return img
    Catch ex As Exception
        MessageBox.Show(ex.Message)
    End Try
    Return Nothing
End Function


Private Sub WaitForPageLoad()
    AddHandler WebBrowser1.DocumentCompleted, New WebBrowserDocumentCompletedEventHandler(AddressOf PageWaiter)
    While Not pageready
        Application.DoEvents()
        System.Threading.Thread.Sleep(200)
    End While
    pageready = False
End Sub

Private Sub PageWaiter(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
    If WebBrowser1.ReadyState = WebBrowserReadyState.Complete Then
        pageready = True
        RemoveHandler WebBrowser1.DocumentCompleted, New WebBrowserDocumentCompletedEventHandler(AddressOf PageWaiter)
    End If
End Sub


End Class
Community
  • 1
  • 1
atrueresistance
  • 1,358
  • 5
  • 26
  • 48
  • From what I understand, the browser isn't rendering the .pdf, Adobe (or whatever pdf viewer you have) is. Is it possible to check the URL of the web browser control when document complete fires? – Jason Bayldon Sep 10 '13 at 23:48
  • I'll have to check debug tomorrow. It does render the correct number of pages. From what I know the browser control uses the installed IE, which in turn calls the installed Adobe. Is there an easier way to call a url, and screenshot? – atrueresistance Sep 11 '13 at 01:05
  • Are you trying to print the pdf as a screenshot or just bypass it and move on via handling readystate complete? – Jason Bayldon Sep 11 '13 at 11:49

1 Answers1

0

Unfortunately, you won't be able to use webBrowser.DrawToBitmap to get a snapshot of the PDF view. At the time of writing this, Adobe Acrobat Reader ActiveX control doesn't support rendering on a custom device context, so this method won't work, as well as sending WM_PRINT or calling IViewObject::Draw, either directly on the Reader ActiveX object on via WebBrowser (I tried that, and I'm not alone). The proper solution would be to use a 3rd party PDF rendering component.

On a side note, you should avoid using code pattern like this:

While Not pageready
    Application.DoEvents()
End While

It's a busy waiting tight loop, consuming CPU cycles in vain. At least, put some Thread.Sleep(200) inside the loop, but overall you should avoid using Application.DoEvents too.

noseratio
  • 59,932
  • 34
  • 208
  • 486
  • I always see this response, but have yet to see a good solution to the problem of waiting for a page to be ready and not freezing the UI (just because the WBC says "ready" doesnt mean the page is truly "ready", I often get this in the middle of loading DB data). – Jason Bayldon Sep 11 '13 at 12:07
  • @JasonBayldon, here's [my approach](http://stackoverflow.com/a/18573522/1768303) to handle "ready" without relying upon `readyState`. Can be done in VB.NET similarly. That however applies to navigating to an HTML page, rather than a PDF. Regarding PDF, I'd host Adobe Reader ActiveX [directly](http://stackoverflow.com/a/18647275/1768303), without `WebBrowser` layer. – noseratio Sep 11 '13 at 12:38
  • 1
    Thank you for the feedback. Since this is a quick dirty throw away program I am leaving the code to what I edited it as. I'll keep the code pattern usage in mind for next time. – atrueresistance Sep 11 '13 at 17:58