0

I'm trying to create a script in vba in combination with IE to scrape image link from some webpages and embed the same right next to ASIN column in a worksheet. In column A there are list of ASINs' and in column B, I would like to embed the image like this. So, the script takes asin from column A and forms a qualified link by appeneding that asin to this base link https://www.amazon.in/dp/. This is the address of one such webpage.

Here are some asins:

B08SRFZX5Z
B08KKJQ8N7
B081RC61YN

Which forms the following links:

https://www.amazon.in/dp/B08SRFZX5Z
https://www.amazon.in/dp/B08KKJQ8N7
https://www.amazon.in/dp/B081RC61YN

I've tried with:

Sub GetImages()
    Const URL$ = "https://www.amazon.in/dp/"
    Dim IE As Object, ws As Worksheet, cel As Range
    
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set IE = CreateObject("InternetExplorer.Application")

    For Each cel In ws.Range("A2:A" & ws.Cells(Rows.count, 1).End(xlUp).Row)
        IE.Visible = False
        IE.navigate URL & cel
        While IE.Busy Or IE.readyState < 4: DoEvents: Wend
        ws.Range(cel(1, 2).Address) = IE.document.querySelector("[id='imgTagWrapperId'] > img").getAttribute("src")
    Next cel
End Sub

How to embed image right next to asin list in a worksheet?

SMTH
  • 67
  • 1
  • 4
  • 17
  • You can use [AddPicture](https://learn.microsoft.com/en-us/office/vba/api/excel.shapes.addpicture) method to insert picture using URL. based on your code it will probably be `ws.Shapes.AddPicture("image url", msoFalse, msoTrue, cel.OffSet(0,1).Left, cel.OffSet(0,1).Top, -1. -1)` – Raymond Wu Aug 05 '21 at 08:49
  • I followed your suggestion and ran the script @Raymond Wu. Now, the script does embed images on the worksheet but the problem is the size of those images remain how they are in that site. Here is how [the image filled in sheet](https://imgur.com/2tNHQYR) looks like. This is [the modified approach](https://pastebin.com/CRF92EBa) by the way. – SMTH Aug 05 '21 at 11:25
  • 1
    You can specify the width and height of the image, replace either or both `-1` value at the end to whatever size you want, you can even follow the cell's height e.g. `cel.Height`. Read the documentation in the link in my previous comment. – Raymond Wu Aug 05 '21 at 12:03
  • 1
    It should be good to add `IE.Quit` at the code end. Otherwise, Internet Explorer hidden instances will remain in memory and only consume computer resources. You can see them using TaskManager... – FaneDuru Aug 05 '21 at 12:16
  • In fact, I would recommend not using of Internet Explorer object. Try using `WinHttp.WinHttpRequest`. It is faster and you shouldn't take care of IE open sessions... – FaneDuru Aug 05 '21 at 13:09
  • I usually go for any browser simulator as a last resort. The problem with xhr is that the site bans the requests after few attempts, so I intentionally chose IE. However, I might be wrong and I would love to stick with xhr if it is worth it @FaneDuru. – SMTH Aug 05 '21 at 14:37
  • I would love to accept your answer if you post one @Raymond Wu . You suggestion perfectly did the trick. – SMTH Aug 05 '21 at 14:38
  • @SMTH I can but FaneDuru's comment has a point - Are you interested in knowing how to do it without using IE? I will amend my answer based on your response. (Or you can try first then ask a new question later) – Raymond Wu Aug 05 '21 at 15:15
  • Then, I will prepare an answer involving your code adaptation in order to avoid using IE. No need to mark it as accepted answer, since it does not answer your question as it was formulated. – FaneDuru Aug 05 '21 at 15:25
  • @Faneduru I was going to do it via XMLHTTP haha I'll do what I can - Upvote! – Raymond Wu Aug 05 '21 at 15:34

2 Answers2

2

Please, test the next code (Without IE):

Sub GetImagesNoIE()
 Dim imageUrl$, p As Shape, ws As Worksheet, cel As Range, rng As Range, arrSrc
 Dim Http As Object, HTMLDoc As Object    'late binding
 'in order to benefit of intellisense suggestions the next two references should be added:
 'Microsoft WinHTTP Services, version 5.1
 'Microsoft HTML Object Library
 Const URL$ = "https://www.amazon.in/dp/"
 
 Set Http = CreateObject("WinHttp.WinHttpRequest.5.1")
 Set HTMLDoc = CreateObject("htmlfile")
 
 Set ws = ActiveSheet 'ThisWorkbook.Worksheets("Sheet1")
 'delete the previous (existing) shapes in column B:B, if any:
 Application.ScreenUpdating = False
 Application.EnableEvents = False
  Set rng = ws.Range("B:B")
  For Each p In ws.Shapes
      If Not Intersect(rng, p.TopLeftCell) Then p.Delete
  Next

  For Each cel In ws.Range("A2:A" & ws.Cells(Rows.Count, 1).End(xlUp).Row)
    If cel.Value <> "" Then
        With Http
            .Open "GET", URL & cel.Value, False
            .setRequestHeader "User-Agent", "Firefox"
            .send
            HTMLDoc.body.innerHTML = .responseText
        End With
        If InStr(HTMLDoc.body.innerText, "We're sorry. The Web address you entered is not a functioning page on our site") = 0 Then
             imageUrl = HTMLDoc.querySelector("[id='imgTagWrapperId'] > img").getAttribute("src")
             Set p = ws.Shapes.AddPicture(imageUrl, msoFalse, msoTrue, cel.Offset(0, 1).Left, _
                                   cel.Offset(0, 1).Top, cel.Offset(0, 1).Width, cel.Offset(0, 1).Height)
            
        Else
            cel.Offset(0, 1).Value = "The Web address you entered is not a functioning page on our site"
        End If
    End If
  Next
  Application.ScreenUpdating = False
  Application.EnableEvents = False
  MsgBox "Ready..."
End Sub
FaneDuru
  • 38,298
  • 4
  • 19
  • 27
  • Hi @FaneDuru, thanks for your answer. There are couple of things which when you fix the script will start working. First of all, When you use late binding `CreateObject("htmlfile")`, you can't use `.querySelector()`. You can't even use class name as a locator in late binding. You can use tag name, though. Another most important thing is that the image that I'm after is dynamic, which you can't grab using xhr. You can access the same image when you use `.getAttribute("data-a-dynamic-image")` instead. – SMTH Aug 05 '21 at 16:35
  • However, You need to process the image links to get the desired one as `.getAttribute("data-a-dynamic-image")` will give you a set of images within a dictionary. Check out [this code](https://pastebin.com/rKYc8Qv7) which I used before creating this post. The linked script will fetch you image links until the site puts a ban. – SMTH Aug 05 '21 at 16:35
  • It's an amazon thing, I remember there was a guy who posted a similar question to yours and the answer code is working perfectly fine on my end but because he was outright banned, the code fails to pull all images hosted on amazon. – Raymond Wu Aug 05 '21 at 16:45
  • @SMTH What do you mean? Did you test the code like it is? It does not need any `.getAttribute("data-a-dynamic-image")`. It brings the pictures exactly like the one using IE, but faster. What does "dynamic" mean? Did your initial code (using IE) bring what you needed? This code should do the same. I can create a function able to firstly put in an array all the necessary picture URLs and downloading pictures after that, in a second step, but I am afraid that the site banns you when try mass downloading, not when search after pictures download URL... – FaneDuru Aug 05 '21 at 17:49
  • Give ***[this](https://filebin.varnish-software.com/hhcgyab47x8ecyuw/execution_demo.mp4)*** a check. Thanks. – SMTH Aug 05 '21 at 17:59
  • Dynamic content means the one which is loaded later. Xhr can't see the content. Xmlhttp requests can only catch those items which are available in page source (`Ctrl + U`), not always in (`Ctrl + Shift + I`). – SMTH Aug 05 '21 at 18:06
  • @SMTH Can you share your workbook to give it a try? The code works without any problem for the three cases in your question. What Excel version do you use? Since, using early binding and having access to intellisense, this property exists and returns the necessary string, I tested it on late binding removing the reference and it worked, too. – FaneDuru Aug 05 '21 at 18:12
  • Your script works when the images are static and the early binding is in place. However, [this link](https://www.amazon.in/dp/B08SRFZX5Z) will definitely fail. Win 7, 32 bit along with Excel 2013 professional plus. – SMTH Aug 05 '21 at 18:24
  • @SMTH It certainly does not fail! It is your first example which I used for testing and the picture is correctly downloaded. Late binding declaration... If not something confidential, can you share your testing workbook? – FaneDuru Aug 05 '21 at 18:34
  • OK. I will try it on my computer. – FaneDuru Aug 05 '21 at 18:51
  • @SMTH So, there are inconsistencies in your workbook. In case of empty cells, of course, it cannot return anything. Then, there are wrong inputs. For instance, on row 33 (984782869) does not exist on the site... – FaneDuru Aug 05 '21 at 19:09
  • Please, ignore the last asins. I thought I deleted them when I uploaded the workbook. Just deal with the asins up to row 7 in column A. – SMTH Aug 05 '21 at 19:32
  • @SMTH I will process everything. I am adapting the code to deal with the empty cells and with wrong inputs. – FaneDuru Aug 05 '21 at 19:39
  • @SMTH Please, test the updated code. It now skips the blanc cells and write something elocvent for cases of wrong inputs in the workbook. Please, download your processed file from [here](https://easyupload.io/2g4qsi) and send some feedback. Processed in Excel 2016 Plus, Win 10 64 bit, but I do not think the aparent problem should be version or OS related. – FaneDuru Aug 05 '21 at 19:44
  • Same issues that I reported earlier. It is perhaps because of the version of excel. [This post](https://stackoverflow.com/questions/67596872/how-to-hold-a-reference-to-the-items-matched-by-queryselectorall-in-a-variabl) somewhat addresses the issue. – SMTH Aug 05 '21 at 20:01
  • @SMTH I am afraid that you are talking about a different issue... You did not even test my code as it is. You used `Set Http = CreateObject("MSXML2.XMLHTTP")` which is different than `Set Http = CreateObject("WinHttp.WinHttpRequest.5.1")`. Different object, different library. Please, try **my code Sub, named "GetImagesNoIE()", as it is** (in the sent workbook) and you may have a surprise to see it working. – FaneDuru Aug 05 '21 at 20:09
  • I gave my earlier feedback after running your script. I know you spent a substantial amount of time to fix the issue, so I surely respect your effort. [Here is](https://filebin.varnish-software.com/mzywkclpdu3n91aw/demo.mp4) the macro in action.. – SMTH Aug 05 '21 at 20:43
  • @SMTH Yesterday, being very late in my country, I had to close my laptop... Now, please try clarifying the next issues: 1. Does the error appear when the code tries to process the first `Cel`? 2. When the code is stopped on error, move the cursor over `cel.Value`. What value does it have? After that, please note the value, identify the row keeping the cell, copy its value and try to manually open the site using the URL (concatenation between "https://www.amazon.in/dp/" and the value in discussion). What does it happen? Can you see what you wanted to see? 3. Can you share this workbook, too? – FaneDuru Aug 06 '21 at 07:22
  • @FaneDuru I believe the issue lies in different version of mshtml.dll in our machines (there was even one updated version in May 2021 iirc). I get the same error as SMTH as well. The method of getting the htmldoc does not matter (XMLHTTP or WinHttpRequest). There's a [follow-up post](https://stackoverflow.com/questions/67759228/what-are-the-currently-supported-css-selectors-available-to-vba) to the post that SMTH linked yesterday which talks a little about it. – Raymond Wu Aug 06 '21 at 15:01
1

Use AddPicture method insert picture using URL:

ws.Shapes.AddPicture("image url", msoFalse, msoTrue, cel.OffSet(0,1).Left, cel.OffSet(0,1).Top, -1. -1)

-1 indicate use the original size but you can change the width and height at your desired size.

https://learn.microsoft.com/en-us/office/vba/api/excel.shapes.addpicture

Raymond Wu
  • 3,357
  • 2
  • 7
  • 20