0

Im successfully navigating web pages using

Set oShell = CreateObject("WScript.Shell") 
strHomeFolder = oShell.ExpandEnvironmentStrings("%APPDATA%")  
Set objExplorer = WScript.CreateObject _ 
("InternetExplorer.Application", "IE_") 
objExplorer.Navigate  "http://www.example.org" 
objExplorer.Visible = 1 

But I want to load webpage without loading images on the webpage (for ex: deleting all the img tags). What is the correct way to do that ?

Matt
  • 45,022
  • 8
  • 78
  • 119
user198989
  • 4,574
  • 19
  • 66
  • 95
  • On approach would be to download the HTML of the page and remove the `` tags with regex. For the download portion you could use this http://stackoverflow.com/questions/11780366/vb-script-or-vba-code-to-copy-the-contents-of-a-web-webpage-to-a-word-excel-shee – Matt Oct 26 '14 at 12:43

1 Answers1

4

You can remove images after loading the page by manipulating the DOM tree:

...

'wait for IE to finish loading the page
While objExplorer.ReadyState <> 4 : WScript.Sleep 100 : Wend

'remove <img> elements fromt the page
For Each img In objExplorer.document.getElementsByTagName("img")
  img.parentNode.removeChild(img)
Next

If you want to avoid loading images entirely you have to disable the setting "Show Pictures" in the Internet Options.

Setting "Show Pictures" in the Advanced Internet Options

This setting can be changed in the registry as well (before you start Internet Explorer), like this:

Set sh = CreateObject("WScript.Shell")

regval = "HKCU\Software\Microsoft\Internet Explorer\Main\Display Inline Images"
sh.RegWrite regval, "no", "REG_SZ"

Set ie = CreateObject("InternetExplorer.Application")
...

As @Matt suggested in the comments to your question you could also retrieve just the HTML page:

url = "http://www.example.org"

Set req = CreateObject("Msxml2.XMLHTTP.6.0")
req.open "GET", url, False
req.send

html = req.responseText

remove the <img> tags (or rather just the value of their src attribute):

Set re = New RegExp
re.Pattern = "(<img[^>]*src=[""']).*?([""'][^>]*>)"
re.Global  = True
re.IgnoreCase = True

html = re.Replace(html, "$1$2")

save it to a local file:

filename = "C:\temp.html"

Set fso = CreateObject("Scripting.FileSystemObject")

fso.OpenTextFile(filename, 2, True).Write html

and then load that local file in IE:

Set ie = CreateObject("InternetExplorer.Application")

ie.Navigate "file://" & filename
While ie.ReadyState <> 4 : WScript.Sleep 100 : Wend
ie.Visible = True

The disadvantage of this approach is that relative links and other references to resources of the originating website (stylesheets, JavaScript libraries, …) won't work anymore, because you're loading the page from a different context where these resources don't exist. Unless, of course, you make them work again by prepending the relative paths with the base URL to turn them into absolute references. Which can be quite a bit of work if you want to cover all bases.

Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328