I have been messing around with HTML from within Excel, trying to approximate how common images are at different resolutions. I'm hoping to get something dynamic - a user enters a search term and the code loops through a set of pre-defined image resolutions, ranking how common images are for that search term between the specified resolutions.
Step one though is to get a reliable (and speedy) way of returning the number of images at a specific resolution. I wrote this code:
Sub GoogleWithURL() 'requires Microsoft HTML Object Library
Dim url As String, searchTerm As String
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("sheet1")
Dim currPage As HTMLDocument
Dim xRes As Integer, yRes As Integer
With ws
xRes = .Range("XRes")
yRes = .Range("YRes")
searchTerm = .Range("search")
End With
'create URL to page with these image criteria
url = WorksheetFunction.Concat("https://www.google.com/search?q=", searchTerm, _
"&tbm=isch&source=lnt&tbs=isz:ex,iszw:", xRes, "iszh:", yRes)
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'objIE.Visible = True 'for debugging purposes
'Google images search
objIE.navigate url
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Set currPage = objIE.document
'Count image results
Set valueResult = currPage.getElementById("rg_s").getElementsByTagName("IMG")
MsgBox WorksheetFunction.Concat("'", searchTerm, "' returns ", valueResult.Length _
, " images @ ", xRes, "x", yRes, "px.") 'returns number of loaded images on page
'close the browser
On Error Resume Next 'required when the browser is visible and I close it manually half way
objIE.Quit
End Sub
It navigates an internet explorer object to a specific resolution google image search, counts the number of images in the rg_s
ID (these are the image results as opposed to the banner images etc.). Then it returns that count as a message box. (When I eventually implement this, I will return the values in a column on the sheet, looping through 30ish different resolutions)
Problems
The main problems with this code are:
It doesn't give a very useful count. The resolution is low because it only counts the images that have loaded - meaning most search terms at common resolutions like 1920x1080 or 1366x768 return the max 100 images.
It is slow. To me, navigating to pages, counting image tags, it all seems a lot like using
.Select
in VBA. It's like a manual approach, what a human would do, and therefore inefficient.
Solutions
I can think of some approaches to solve these problems
Resolution of data/ getting a more useful count
Scroll down. If I can load more images, it's likely I'll be able to differentiate a little better. I found that scrolling as far down as I can (up to the 'Load more results' button) gives a cap of 400 not 100 - if there are at least that many images for a given resolution then I'm happy and I'll give it the top rank. Doesn't help with problem 2 though. Still, how would I do that?
Narrow results. If 100 is returned, I could change the
filetype:
in the URL I send, like appendingfiletype:png
to maybe halve the number of images returned, giving me a better spread in the 0-100 range. Not ideal though as I would have to iterate through multiple filetypes for some resolutions, slowing down the code and even then, not necessarily giving me what I want.Use Google's (or another search engine's) own values for this. I've asked this on various sites and in various forms, is there any data on image count available direct from Google - i.e. without returning (and slowly loading) the images themselves. Like the
about 1,300,500 results in 0.03 seconds
for a regular search, only for images? If I could use a pre-calculated value each time that samples a bigger array than the 100 results, I might get a much more detailed picture.
Slowness
- Try a different kind of HTTP request. Right now I open an instance of internet explorer and navigate to a page. That sounds very human style, I would prefer a computer style request. What I mean is, instead of using my laptop to trawl through images one by one, I get Google's supercomputers to do the leg work by only asking for the count, as oppose to the images themselves. Not a clue how to do this. I'm aware of another 2 ways to search the web from within Excel; web query and
CreateObject("MSXML2.serverXMLHTTP")
. Don't know either of those much but if you think they would be a better way to go then I'll look into them more closely.
- Try a different kind of HTTP request. Right now I open an instance of internet explorer and navigate to a page. That sounds very human style, I would prefer a computer style request. What I mean is, instead of using my laptop to trawl through images one by one, I get Google's supercomputers to do the leg work by only asking for the count, as oppose to the images themselves. Not a clue how to do this. I'm aware of another 2 ways to search the web from within Excel; web query and
Summary
Hopefully that's plenty to go on, I think my train of thought should be fairly clear. Actual answers on how to scroll down/ load more images/ get Google to return a count rather than images themselves would be best, advice on what to pursue would be useful too.