Accessing content inside #document of a website

Question

I would like to access content of a web-page using C#. The content is inside an i-Frame of the Body of the website, underlying an #document object. I am using this to read the page:

WebClient wbClient = new WebClient();
wbClient.UseDefaultCredentials = true;
byte[] raw = wbClient.DownloadData(stWebPage);
stWebPageContent = System.Text.Encoding.UTF8.GetString(raw);

However, the relevant information inside the #document is ignored.

Can anybody explain what I have to do to access the needed info? It is nested under body/div/iframe/#document/html/body/div/..... Thanks!

score 0 · Accepted Answer · edited May 23 '17 at 10:31

0

Note: I am assuming stWebPage is pointing to a http url.

iFrame content will not be downloaded directly in this one call. You need to look for iFrame in stWebPageContent using Regex and pull the value in 'src' attribute, make another call to the src url for downloading content. More details can be found at this link.

edited May 23 '17 at 10:31

Community

1
1

answered May 09 '17 at 22:04

Sharada Gururaj

13,471
1
22
50

Thanks alot. I think that helped :) – May 10 '17 at 11:30

Accessing content inside #document of a website

1 Answers1