1

I would like to ask if it's possible to get programmatically in C#, a specific site content size. By size I mean: the full size of the site including all images and scripts referenced in the head section or body and so on. For example if we have a site http://www.google.com I want o get it's total size including the logo, scripts refered to, and so on as it will be presented to the user not just the main page.

Here is a picture what I mean: (click for full size)

If we use IE Developer tool in IE 9, and start capturing traffic on the network session, than we hit google.com and it shows the total files loaded (.js, .png, and so on) and the time of loading in milliseconds.

I tried to do something similar using a webrequest but i get only 43kb instead of 101 as IE developer tool gets.

Here is the code:

WebRequest request = WebRequest.Create(textBox2.Text.ToString());     
request.Credentials = CredentialCache.DefaultCredentials;           
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();      
StreamReader reader = new StreamReader(dataStream);          
string responseFromServer = reader.ReadToEnd();         
byte[] bytes = Encoding.ASCII.GetBytes(responseFromServer);
MessageBox.Show(ConvertSize(responseFromServer.Length) + "  -  " + responseFromServer.Length.ToString());
reader.Close();
dataStream.Close();
response.Close();

How can I get the total size of a site including all images, js and additional files used/referenced in that specific page? Thanks a lot!

Shadow The GPT Wizard
  • 66,030
  • 26
  • 140
  • 208
user1493460
  • 71
  • 1
  • 6
  • I would guess that google may well deliver different content based on what it thinks you can handle. When I look at the source of the google homepage (on FF) and just do a character count its got just over 100k characters which is a little higher than IE told you. I would guess that your WebRequest method really is getting 43k of file. Try it again with proper browser impersonation (ie setting user agent, etc.) and see if you get a different sized file... And of course google does show you different content logged in compared to not... – Chris Jul 23 '12 at 14:32

1 Answers1

0

Your WebRequest is just getting the HTML. It's not parsing to fetch any referenced files (images, CSS, javascript includes, etc). Controls such as the WebBrowser control can allow you to automate a browser

podiluska
  • 50,950
  • 7
  • 98
  • 104