0

Writing a crawler in VBA in combination with selenium to parse the price of different products from a webpage, I get an issue upon execution. It breaks when it finds certain prices are none. Using "on error resume next" I can get the full results. However, I wish to execute my code without using "on error resume next". If it were not for selenium, I could have used Length property to get rid of that. But, selenium doesn't support that. Hope I can have any workaround here.

Sub Redmart_scraping()
Dim driver As New ChromeDriver
Dim posts As Object, post As Object

With driver
    .get "https://redmart.com/bakery"
    Set posts = .FindElementsByCss("li.productPreview")
End With

On Error Resume Next

For Each post In posts
    i = i + 1
    Cells(i, 1) = post.FindElementByCss("span[class^=ProductPrice__price]").Text
Next post
End Sub
SIM
  • 21,997
  • 5
  • 37
  • 109

5 Answers5

1

You can extract the prices as follows:

Sub Redmart_scraping()
Dim driver As New ChromeDriver
Dim posts As Object
Dim i As Long

With driver
    .get "https://redmart.com/bakery"
End With

Columns("A:A").NumberFormat = "[$$-409]#,##0.00"

For Each posts In driver.FindElementsByClass("productPreview")
    i = i + 1
    'Cells(i, 2) = posts.Text
    For Each Item In Split(posts.Text, vbLf)
        If InStr(1, Item, "$", vbTextCompare) > 0 Then
            If InStr(2, Item, "$", vbTextCompare) > 0 Then
                Cells(i, 1) = Mid(Item, 2, InStr(2, Item, "$", vbTextCompare) - 2)
            Else
                Cells(i, 1) = Right(Item, Len(Item) - 1)
            End If

       End If
    Next
Next
End Sub

Please note that posts.Text holds all the information you need for one item. So besides price, you can extract item name, discount, customer ratings, weight, price before discount, and guaranteed fresh dates.. Uncomment 'Cells(i, 2) = posts.Text and see for yourself.

I am leaving rest of the fun work to you. Good luck!

Tehscript
  • 2,556
  • 2
  • 13
  • 23
  • You are impossible, Tehscript. You just made me dumbfounded. I bet, you won't find a single thread where such a wonderful solution has been provided to get around "on error resume next" occurred by vba + selenium, at least I didn't. Someday, I will find a difficult problem for you to resolve:). Thanks a trillion. – SIM Jul 16 '17 at 19:06
  • Thank you SMth80! Always a please to answer your questions! Keep up the good work! – Tehscript Jul 16 '17 at 19:10
  • 1
    @SMth80 Yeah looks ok but there are 60 prices on the site and your code extracts only 31 of them. In this case you need to look for a different element in the code. – Tehscript Jul 16 '17 at 20:24
  • 1
    It's not right what i did but i tried and for that i dare to show you. Thanks. – SIM Jul 16 '17 at 20:30
1

As it seems, from looking at accepted answer, your script fails on the promo prices use 2 attribute selectors combined with OR to set your original posts variable. Then you have no nulls to handle. (Belated answer - came across this whilst hunting for a resource!).

Option Explicit
Public Sub Redmart_scraping()
    Dim driver As New ChromeDriver
    Dim posts As Object, post As Object, i As Long

    With driver
        .get "https://redmart.com/bakery"
        Do
        Set posts = .FindElementsByCss("li.productPreview span[class^=ProductPrice__price],span[class^=ProductPrice__promo_price]")
        Loop While posts.Count = 0
    End With

    For Each post In posts
        i = i + 1
        Cells(i, 1) = post.Text
    Next post
End Sub
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Yes your one is working @QHarr. Now, take a look at my finding below how it can be done alternatively. Thanks. – SIM Oct 15 '18 at 14:08
1

According to the documentation, this is how Florent B. suggested to avoid any index error and make the code run through the end.

Sub ScrapePrice()
    Dim driver As New ChromeDriver, post As Object

    With driver
        .get "https://redmart.com/bakery"
        For Each post In .FindElementsByCss("li.productPreview")
            If Not post.FindElementByCss("span[class^='ProductPrice__price']", Raise:=False, timeout:=0) Is Nothing Then
                R = R + 1: Cells(R, 1) = post.FindElementByCss("span[class^='ProductPrice__price']").Text
            End If
        Next post
    End With
End Sub
SIM
  • 21,997
  • 5
  • 37
  • 109
  • In my case the content hasn't loaded by the time this line For Each post In .FindElementsByCss("li.productPreview") is hit so it jumps to the end. It needs, at least from my perspective, a delay until results are present/ with timeout ideally (yes, I was lazy and didn't include one!). But this is the way of He Who Is Selenium Basic? What other lovely tricks do you have up your sleeve? See, this is why I should stalk your old questions/answers ! + – QHarr Oct 15 '18 at 15:03
  • Sooooo....... in your readings have you found a way in VBA selenium to pull up a webpage but prevent the JS from executing i.e. scripts from running? I thought .SetPreference "javascript.enabled", False would do it but it doesn't seem to work like that. Looks [like](https://stackoverflow.com/questions/3526361/firefoxdriver-how-to-disable-javascript-css-and-make-sendkeys-type-instantly) maybe I need a separate profile and load that..... not sure. – QHarr Oct 15 '18 at 15:15
  • 1
    I'm the worst reader ever!! I really didn't think of that @QHarr. I'll try for sure when I'm free. It's always a pleasure to find you in the loop. – SIM Oct 15 '18 at 15:19
0
Sub Testing()
    Dim driver As New WebDriver
    Dim posts As Object, post As Object
    dim this as string, that as string

    Set driver = New WebDriver
    driver.Start "Phantomjs", "https://redmart.com"
    driver.get "/bakery"
    Set posts = driver.FindElementsByXPath("//div[@class='productDescriptionAndPrice']")
    On Error Resume Next
    For Each post In posts
        i = i + 1
        If Isnull(post.FindElementByXPath(".//h4/a").Text) = True Then
            this = ""
        Else
            this = post.FindElementByXPath(".//h4/a").Text
        End if
        Cells(i, 1) = this

        if IsNull(post.FindElementByXPath(".//span[@class='ProductPrice__price___3BmxE']").Text) = True then
            that = ""
        else
            that = post.FindElementByXPath(".//span[@class='ProductPrice__price___3BmxE']").Text
        end if
        Cells(i, 2) = that
    Next post
End Sub

The age old "null pointer" nuisance. Checking for nulls is important. There's prolly a better way to do this but you get the point (pun intended)

Doug Coats
  • 6,255
  • 9
  • 27
  • 49
  • Thanks sir Doug Coats, for your answer. The code you provided can't shun the problem. It breaks when it hits the first null value. However, Your provided code runs until the value is exhausted because "on error resume next" is also there. – SIM May 21 '17 at 15:18
  • @SMth80 then take out on Error resume next then? – Doug Coats May 21 '17 at 15:47
  • I did that sir in the first place and gave you the feedback. Thanks again. – SIM May 21 '17 at 16:58
  • 1
    I've never played with this stuff before so this is purely a question, not a criticism, but wouldn't `Isnull(post.FindElementByXPath(".//h4/a").Text)` need to be `Isnull(post.FindElementByXPath(".//h4/a"))` - i.e. wouldn't the `FindElementByXPath` be the thing that is potentially `Null` and then attempting to find the `.Text` property of `Null` will crash rather than returning `Null`? – YowE3K May 21 '17 at 19:51
  • 1
    I assume you would need to check both for nulls, didnt think about that actually. I was going to install selenium just to be 100% sure but eh. – Doug Coats May 21 '17 at 20:02
0

On Error Resume Next just skips any runtime errors - you are right in not wanting to apply this in general.

If you cannot prevent these errors from occurring in the first place (by testing for null, for example) you should activate On Error Resume Next only for that part of the code that actually can create these errors and deactivate it right after with On Error Goto 0.

Inside of these On Error blocks you can also explicitly test which error occurred (and hence will be ignored) by using the Err object. This way you could re-raise all errors that you didn't expect and only ignore those you expected. Beware: Raising an error must be done outside of an On Error block, but leaving the block will reset the Err object. Therefore you would have to save all necessary error information in another variable (or variables) to access it after you leave the On Error block.

I'm afraid with the general messed up design of error handling provided by VBA, there is no better way.

Leviathan
  • 2,468
  • 1
  • 18
  • 24