2

There are some websites, such as gmail.com that don't display source information (i.e. you cannot right-click and select "View Source")

So I am trying to read the document source into a file so I can see the different types of elements (I would like to be able to pass credentials and other data into websites eventually), but I'm having difficulty.

Here is the code:

$ie = new-object -com "InternetExplorer.Application"
$ie.navigate("http://www.gmail.com")
$ie.visible=$true
$doc = $ie.document
Add-Content C:\output.txt $doc.all

C:\output.txt is blank, help!

Glowie
  • 2,271
  • 21
  • 60
  • 104

2 Answers2

4

The problem with using InternetExplorer.Application is you then have to handle the application behaviour, for example if I run your code I also get an empty file, because the page loaded after the document property was accessed.

If you are using Powershell v3, you can use the Invoke-WebRequest cmdlet to directly query the webserver as follows:

$webreq = Invoke-WebRequest http://www.gmail.com
$webreq.Content |Out-File C:\temp\output.txt

In powershell v2 you can use the System.Net.Webrequest .NET class as follows:

$req = [System.Net.WebRequest]::Create("http://www.gmail.com/")
$resp = $req.GetResponse()
$reqstream = $resp.GetResponseStream()
$stream = new-object System.IO.StreamReader $reqstream
$result = $stream.ReadToEnd()
$result | out-file c:\temp\output2.txt
Graham Gold
  • 2,435
  • 2
  • 25
  • 34
  • This solution works. When I output $result to the console I can read clearly, but when I output to .txt file, everything is jumbled. What filetype can I pipe $result to? – Glowie Aug 27 '13 at 13:10
  • What do you mean by "jumbled"? if output to console or to file and then read the file, both are the same... as you woudl expect since the source is the same. – Graham Gold Aug 27 '13 at 13:24
  • @ Graham Gold: Oh, I mean that the output is not neatly organized into lines and tab delimited as it is when $result it piped to console. When I output $results into .txt file, everything is bunched up together, not separated by spaces and tab delimited – Glowie Aug 27 '13 at 14:19
  • I presume you are opening in notepad? It is not the greatest at handling carriage return/ line feed. Try wordpad or notepad++ – Graham Gold Aug 27 '13 at 17:28
2

You can view the source of any website. I am able to see the source for Gmail on Chrome, using the normal method of right click -> View page source.

You can also open up Developer Tools -> Elements to see source.

In Chrome, you can even use a URL like view-source:https://mail.google.com/mail/u/0/?shva=1#inbox to view source.

Going the route of getting the source from Powershell will only get more and more complicated.

manojlds
  • 290,304
  • 63
  • 469
  • 417