0

I'm building a model that attempts to pull data from the web across different websites using Select All > Copy. Below is the code that I have, and it seems to work in break mode in certain areas, and in other areas it only works when I run the macro.

The portion that is puzzling me at the time is when it hits: "ActiveSheet.PasteSpecial Format:="Text", link:=False, DisplayAsIcon:=False" , it fails and gives me Error 1004 "PasteSpecial method of Worksheet class failed."

On hitting F8 after debugging, the code continues just fine (albeit after showing me "Can't Execute code in break mode 3 times). I've tried altering the code to show "Worksheets("GOOGLE")" and other methods of defining the worksheet directly. My hunch is that may not be the issue. If that's the case, I have no idea what's going on here! Can someone test this out?

FYI I also use a Userform (modeless) on top of this code as a "Waiting" message as it can be quite long to run. Not sure if this is interfering with the paste.

Dim IE As Object
Dim PauseTime, Start
PauseTime = 22 ' Set duration in seconds
Start = Timer ' Set start time.

Application.ScreenUpdating = False

Worksheets("GOOGLE").Activate
Worksheets("GOOGLE").Cells.Clear
Worksheets("GOOGLE").Range("A1").Copy
Application.CutCopyMode = False


    Set IE = CreateObject("InternetExplorer.Application")
    With IE
        .Navigate Range("GOOGLEURL").Value
        Do Until .ReadyState = 4: DoEvents: Loop
        End With

        Do While Timer < Start + PauseTime
        DoEvents
        Loop

        IE.ExecWB 17, 0 '// SelectAll
        IE.ExecWB 12, 2 '// Copy selection
        ActiveSheet.Range("A1").Select
        ActiveSheet.PasteSpecial Format:="Text", link:=False, DisplayAsIcon:=False
        IE.Quit


    On Error GoTo Ending
        IE.Quit 
        Application.CutCopyMode = False

Ending:
Application.CutCopyMode = False
Exit Sub
nsjkd
  • 43
  • 3
  • 9
  • Instead of copy/paste, why don't you just assign a string variable to the `IE.Document.body.innertext`? – David Zemens Sep 06 '13 at 01:35
  • Another alternative which doesn't use IE at all is to use the `QueryTables` method in Excel. I think this is actually a better method to use. I've updated my answer below, let me know if either of those approaches work for you! – David Zemens Sep 06 '13 at 03:03

4 Answers4

2

Try this method instead of copy/paste between applications. Like you, I tried that and found it unreliable and often didn't work.

You can grab the page's innerText in a string and just use that, or, you could split the innerText in to an array and put that on the sheet, as I do in my example. This preserves the line breaks and makes it a bit more readable than putting all the text in a single cell

I verify this on a simple example (http://google.com) that both methods return the exact same layout of cells in the worksheet.

NOTE: This method may not work when you have the ChromeFrameBHO Add-In installed in IE (see here).

Sub Test()
Dim IE As Object
Dim pageText As String
Dim page As Variant

Set IE = CreateObject("InternetExplorer.Application")
    With IE
        .Navigate "http://google.com"
        Do Until .ReadyState = 4: DoEvents: Loop
    End With

    pageText = IE.Document.body.innertext
    page = Split(pageText, vbCr)

    Range("A1").Resize(UBound(page)).Value = Application.Transpose(page)

    IE.Quit
    Set IE = Nothing

End Sub

Another method which doesn't rely on Internet Explorer is the QueryTables method. It may or may not be appropriate for your needs, but try something like this.

NOTE: This method appears to work (for me) whether the ChromeFrameBHO plugin is installed.

Sub TestQueryTables()

    Dim googleURL as String
    googleURL = Range("GOOGLEURL")

    With ActiveSheet.QueryTables.Add(Connection:= _
            "URL;" & googleURL _
            , Destination:=Range("A1"))
            .Name = googleURL
            .FieldNames = True
            .RowNumbers = False
            .FillAdjacentFormulas = False
            .PreserveFormatting = True
            .RefreshOnFileOpen = False
            .BackgroundQuery = True
            .RefreshStyle = xlInsertDeleteCells
            .SavePassword = True
            .SaveData = True
            .AdjustColumnWidth = True
            .RefreshPeriod = 0
            .WebSelectionType = xlEntirePage
            .WebFormatting = xlWebFormattingNone 'or use xlWebFormattingAll to preserve formats
            .WebPreFormattedTextToColumns = True
            .WebConsecutiveDelimitersAsOne = True
            .WebSingleBlockTextImport = False
            .WebDisableDateRecognition = False
            .WebDisableRedirections = False
            .Refresh BackgroundQuery:=False
        End With

End Sub
Community
  • 1
  • 1
David Zemens
  • 53,033
  • 11
  • 81
  • 130
  • I see a similar suggestion on this site: http://www.mrexcel.com/forum/excel-questions/476334-copying-browser-text-excel-document-body-innertext.html. This also gives me Error 438 at the point where you define pageText =IE.document.body.innerText similar to the poster in the aforementioned thread. Maybe it's the type of page? I'm running this also for Expedia upon checking flights; it doesn't seem to work on that. – nsjkd Sep 06 '13 at 13:39
  • Can you give me a sample URL that you are trying to access? I'll see if there's a quick workaround. It also might be good to know if you **really** need the entire page text, or if you're just trying to find particular *pieces* of information. – David Zemens Sep 06 '13 at 13:48
  • With QueryTables, I'm not getting the result that I wanted (as I was with my original method). I suppose it's the type of site since when I manually go to do this though Data > Get External Data > From Web, the only "Table" that I get on the Google Flights site is the top bar of Google tools (without any actual flight data). I would prefer the original method over trying to tackle this one since I feel that I'm pretty close to getting it to work if not for the failed Paste portion that requires F8. – nsjkd Sep 06 '13 at 13:53
  • Here is the sample URL. You'll see that it takes awhile to load hence my 22-second timer function: http://www.expedia.com/Flights-Search?trip=roundtrip&leg1=from:JFK,to:MIA,departure:09/13/2013TANYT&leg2=from:MIA,to:JFK,departure:10/18/2013TANYT&passengers=children:0,adults:1,seniors:0,infantinlap:Y&mode=search Also, loading this site in the Get External Data > From Web function in excel, the main tool doesn't seem to load which doesn't allow me get the flight data (it doesn't come up as a table that I can select): https://www.google.com/flights/#search;f=JFK;t=SFO;d=2013-09-13;r=2013-10-18 – nsjkd Sep 06 '13 at 13:57
  • OK. I do get that same error and checking my Locals window this is indeed a different *type* of document which doesn't support the `.body.innertext` properties. I'll see what I can figure out, but also want to know if there is specific info that you want to scrape from the page, or if you truly do need *all* of it. – David Zemens Sep 06 '13 at 14:01
  • I simply need the side box that says "Filter your results by..." to get a comparison of the Cheapest Flight and the Cheapest Non-Stop Flight (or 1-stop if non-stop is unavailable). I was planning on building a template to scrape this once the data was pulled in properly since it doesn't seem to exist as its own table. For the Google site, I need similar information, though it's presented as a single list of flights normally instead of a side-box (which is fine). – nsjkd Sep 06 '13 at 14:04
  • 1
    QueryTables method appears to be working for me, those prices are listed in row 158, 160, and 162 when I run it. The other method is not working, perhaps because of [this](http://stackoverflow.com/questions/9713936/chromeactivedocument-returned-from-internetexplorer-application-object)? – David Zemens Sep 06 '13 at 14:35
  • Rats, that's it. The Chrome plug-in allows me to view the Google Flights page but it also disrupts the macro flow when running it for some reason. For those reading, the Add-on is ChromeFrameBHO and needs to be disabled to get the paste function to work properly. I'm surprised, however, that it causes this error only on running the macro but works just fine in break mode. Is there another method of PasteSpecial that I can use so that I can keep this plug-in enabled? It's not a very effective query if I have to keep enabling/disabling the plug-in and stepping into the macro to proceed. – nsjkd Sep 06 '13 at 15:00
  • 1
    Not that I can find. I'd encourage you to try QueryTables again. That is working for me. You indicate it is not working for you, but you didn't specify what you think is wrong with that approach. – David Zemens Sep 06 '13 at 15:20
  • Thanks, I appreciate it. That method works much faster than the one I was using, however, it doesn't work properly for the Google site (although mine somewhat does in break mode). I'll use the QueryTables method for the sites I have that support it. I suppose I'll have to keep digging to find a workaround for that error in Google though. – nsjkd Sep 06 '13 at 15:26
1

I actually have been struggling with this exact same issue from copy and pasting a bunch of images. Excel 2010 apparently has issues with trying to paste before the copy command is complete. What you can do is a combination of the sleep event and error handling the specific 1004 error. Set up the error handler to catch the 1004 error, and just have it resume. What I did was set up a counter like this:

Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
On Error GoTo ErrorHandler:
Dim err_counter As Integer

ErrorHandler:

If Err.Number = 1004 Then
    err_counter = err_counter + 1

    If err_counter > 10 Then
        MsgBox ("The copy function is taking too long. Consider using smaller images.")
        Exit Sub
    End If

    DoEvents
    Sleep 500
    DoEvents
ElseIf Err.Number <> 0 Then
    MsgBox ("Unknown error.")
    On Error GoTo 0
    Resume
End If

You don't need to use an error counter, but I thoguht it would be a good idea to keep future users of my spreadsheet from somehow creating an infinite loop. I also would clear the clipboard after each image paste, and if you use an error counter, reset it to 0 after a paste is successful.

0

It looks like you're copying but you're clearing the clipboard before you paste so there's nothing for the code to paste.

Worksheets("GOOGLE").Range("A1").Copy
Application.CutCopyMode = False

Also, are you copying from Sheets("Google").Range("A1") to Sheets("Google").Range("A1")? I don't understand that

Eric J
  • 227
  • 5
  • 16
  • Yes exactly. This initially clears the clipboard so that it properly copies the contents of the IE page (which you see later in the code). The initial Copy > Clear is just a primer; I was getting errors before adding this which left remnants of my last Copy action in subsequent runs of this code (i.e. If I ran a code similar to this for BING following this code, it would have pasted the same Google page instead of the Bing data). My only problem now is getting past that one line of code that causes an Error. When I run using F8, the line runs perfectly and the result is what I expect. – nsjkd Sep 05 '13 at 20:47
  • Also, the actual copy code is {IE.ExecWB 17, 0 '// SelectAll IE.ExecWB 12, 2 '// Copy selection} – nsjkd Sep 05 '13 at 20:57
  • Are there any other workbooks open while this is running? Perhaps your workbook is not active, maybe use ThisWorkbook?...I'm just running through the list of things I know will produce this error as your syntax looks correct to me – Eric J Sep 05 '13 at 21:04
  • 1
    Should probably be a comment now that I've realized his copy of the webpage comes later...but you already knew that now didn't you? ;) – Eric J Sep 05 '13 at 21:08
  • This is the only workbook open. I've tried prefacing the paste with Worksheets("GOOGLE").Activate (which seems redundant considering the line ActiveSheet.Range("A1").select works just fine). This doesn't actually help though, so I don't think it's an Inactive workbook when the paste command comes around. – nsjkd Sep 05 '13 at 21:37
0

I am not in a position to verify my response but I had a similar issue about a year ago. The webpage in question had to use a copy/paste rather than using innertext. It seems you have done most of what I did including pausing waiting or the copy to complete. (Readystate was unhelpful for me.)

The last thing I remember doing, which allowed the code to work, was to place the paste in a finite loop. The paste was typically successful between the third and eighth attempt.

I'm sure there is a better way but was unable to find it. Since my application was for my own use the code was acceptable. As the webpage would change every few months, the code was abandoned.

  • Thanks for confirming that it's happening elsewhere as well. My final solution for the Google search was to create the IE object, make it visible, and use a 7-second delay to let the page load. On occasion this would not load properly (webpage delay, IE took too long to boot, etc.) so I used an IF statement to verify if the data was pulled in. If it failed, the operation would repeat. On the second pull, the webpage typically loaded within the 7-second time. I used SendKeys with pauses in between to do the paste special (though this is definitely a flaky method, I saw no way around it). – nsjkd Sep 16 '13 at 14:40