1

I want to read an HTML page including the contents of iframes, in C#. I have used some techniques but the result is always access denied...

The page I want to read contains nested frames:

"Main page > iframe > iframe >iframe"

I want to read all of that but I can't read the iframe's content separately because that will redirect to another page.

<html>
<body>
<iframe>
  <html>
    <body>
    <body>
  </html>
</iframe>
<body>
</html>

I tried to use web client and web browser control methods, but they didn't work.

dda
  • 6,030
  • 2
  • 25
  • 34
MandyGenius
  • 23
  • 1
  • 5

3 Answers3

1

Pretty simple. If you are using the webBrowser control:

HtmlElement element = webBrowser1.Document.Window.Frames["frame-id"].Document.GetElementById("element-id");

If you have multiple IFrames nested, you can chain the query:

HtmlElement element = webBrowser1.Document.Window.Frames["frame-id"].Frames["second-frame-id"].Document.GetElementById("element-id");

I added the Document.GetElementById("element-id"); in case you were trying to access an element within the IFrame. You can ignore those if not.

Make sure that you look at the source code for the entire document that is loaded. There may be multiple nested IFrames that you need to chain together to get what you want.

Also, be sure the IFrame is fully loaded before trying to access it, or you won't have any luck. For more info on waiting for dynamic pages to load, see this article: how to dynamically generate HTML code using .NET's WebBrowser or mshtml.HTMLDocument?

Community
  • 1
  • 1
Ben Holland
  • 321
  • 1
  • 8
0

So you put no code, and barely any information on your code. However, if you are using .NET Framework C# desktop application, most likely you have invalid html code or to advanced code for Microsoft. Using the toolset web browser is a downgraded version of Internet Explorer, and will not read HTML5 for the most part. It will read an iframe as well as objects. You have to be reading from a public http for all sites.

//UPDATED ANSWER:

Create PHP file and host it. Use this file to read the site.

<?php
$homepage = file_get_contents('http://www.foobar.com/');
echo $homepage;
?>

Alternative if PHP fails to have that work: https://code.google.com/p/php-proxy/

  • HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(url); myRequest.Method = "GET"; WebResponse myResponse = myRequest.GetResponse(); StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8); string result = sr.ReadToEnd(); sr.Close(); myResponse.Close(); – MandyGenius Nov 01 '14 at 14:10
  • Okay.. I think you can need to find the backdoor of getting silly iframes with your code. Make a PHP script to read the website and echo it. Then use that php script for your HTTP request. – Katy Pillman Nov 01 '14 at 14:12
  • I had multiple PHP queries of files, and http request stay alive until the first print, so read the entire site from a PHP file and use that as the url. – Katy Pillman Nov 01 '14 at 14:13
  • try my updated answer... if not then I will send a link to my proxy script which definitely works. – Katy Pillman Nov 01 '14 at 14:24
  • that didn't work the iframe redirected to another page.. php script:http://mandygenius.com/myfile.php ..... actual link:http://g2g.fm/forum/showthread.php?5157-Z-Nation-Season-1-Episode-8-Download-S01E08-1080p-HDTV-Streaming-Subtitles – MandyGenius Nov 01 '14 at 14:34
  • it did work, but a pop-up redirected the url. I can't help nor no one else can because the site has adware ads on it. – Katy Pillman Nov 01 '14 at 14:37
  • there is guy over plex channel's blog, he some how manage to access those nested iframe to extract link .. https://github.com/TehCrucible/G2Gfm.bundle/blob/master/Contents/Code/__init__.py – MandyGenius Nov 01 '14 at 14:48
0

With the WebBrowser component you parse through the HtmlDocument using:

 foreach (HtmlElement e in WebBrowser.Children.All)
    <your code here>;

With IFrames the elements are in the Document so you get:

HtmlElement iframe_element = <your IFrame element>
foreach (HtmlElement e in iFrame_element.Document.Children)
   <your code goes here>;

What you want to do is save your reference on your webpage so you don't have to parse through your webpage each time to find your IFrame. That recursive nature is rather slow and that will save you some heartache. One you have your IFrame then you can write normal code to find the HtmlElements you need.

Hope that helps.