1

I have been recently working in downloading webpage content using WebClient in C#. The DownloadString method of WebClient can not download the content from iframe.

The short code for downloading content has been used as:

   using (var client = new WebClient())
   {
        string html = client.DownloadString("url");
   }

What should I need to use for reading iframe content in C#?

For Testing, I am using http://multiprofits.co.uk/oddsmatcher.html site which has iframe in it.

  • 1
    Either use `HtmlAgilityPack` to parse the content manually and then load the `iframe` with another `DownloadString` request, or use `WebBrowser` (which supports [much more complex web scrapping scenarios](http://stackoverflow.com/questions/22239357/how-to-cancel-task-await-after-a-timeout-period/22262976#22262976)). – noseratio Jun 20 '14 at 11:06
  • The problem here is that iframe content getting from another DownloadString is not correct which has displayed in original webpage. –  Jun 20 '14 at 11:11
  • @akash88 duplicate of http://stackoverflow.com/questions/14429023/can-i-read-iframe-through-webclient-i-want-the-outer-html ? – Paul Zahra Jun 20 '14 at 11:18
  • @akash88, then use `WebBrowser`, follow the link I posted. – noseratio Jun 20 '14 at 11:19
  • @PaulZahra : The issue is same with that solution as well. –  Jun 20 '14 at 11:22
  • @akash88 As Noseratio says... the solution in the link I gave uses the WebBrowser class, not WebClient... is that an issue? – Paul Zahra Jun 20 '14 at 11:25
  • @Noseratio : I have used WebBrowser control but not the way in the link you posted. The solution in the link differently handles the iframe content? –  Jun 20 '14 at 11:25
  • @akash88, with `WebBrowser`, it's as simple as this: `var frameDocument = webBrowser.Document.Window.Frames["iframeId"].Document`. – noseratio Jun 20 '14 at 11:27
  • Yes, done that way as well but it returns Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED)) –  Jun 20 '14 at 11:33
  • I guess the iframe content comes from a different domain, and you're facing cross-domain security restrictions (XSS). If that's the case, check [this](http://stackoverflow.com/a/18820169/1768303). – noseratio Jun 20 '14 at 11:49
  • @akash88 Yes, indeed there is some security involved... try accessing the iframe directly... http://v2.oddsmatcher-data.co.uk/oddssearch.aspx?AffSiteID=325453&gridSkin=WebBlue gives "oddsmatcher is not permitted to run on this domain name" looks as though they've restricted the calling domain. I guess you could try spoofing your request as if it comes from their website... but that might be a little illegal depending on the rights of their data etc. – Paul Zahra Jun 20 '14 at 12:00
  • @akash88 Why not just try screen scraping it something like http://stackoverflow.com/questions/599275/how-can-i-download-html-source-in-c-sharp – Paul Zahra Jun 20 '14 at 12:07
  • @Paul Zahra I have tried this solution but it doesn't work (cf. my post) – christof13 Jun 20 '14 at 12:09
  • @christof13 I was thinking more along the lines of the post in the link I gave by Diego Jancic where he just creates a WebRequest and reads the stream. – Paul Zahra Jun 20 '14 at 12:40
  • @Paul Zahra I have tested the WebRequest solution but it doesn't work – christof13 Jun 20 '14 at 12:45

1 Answers1

3

You have to search for the iframe tag in the main page and then take the src attribute to download the page in the iframe

using (var client = new WebClient())
{
    string html = client.DownloadString("url");
    string src = ... //find iframe source with regex
    string iframe = client.DownloadString(src);
}

For the regex you could use this Regular Expression to get the SRC of images in C#

Edit :

        using (var client = new WebClient())
        {
            string html = client.DownloadString("http://multiprofits.co.uk/oddsmatcher.html");
            string src = Regex.Match(html, "<iframe.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;
            Console.Write(client.DownloadString(src));
        }

You really get the iframe source with this code

Edit2 :

I have found your problem. It's a security issue from the site. Launch the iframe url in a new browser you will receive this message :

oddsmatcher is not permitted to run on this domain name [v2.oddsmatcher-data.co.uk/v2.oddsmatcher-data.co.uk] For more details please cotact support@oddsmonkey.com

So you must can't download directly the iframe source. You probably have to use WebBrowser or something like this

Community
  • 1
  • 1
christof13
  • 329
  • 1
  • 10
  • I have doing that way too. But iframe content is not what it is display in web page. –  Jun 20 '14 at 11:09
  • I don't understand. The iframe src is the url of the page displayed. So if you download this page you will have the iframe content. – christof13 Jun 20 '14 at 11:10
  • If the iframe page contains css, javascript,... you will have to download them too to display the page correctly. So you better to use a tool – christof13 Jun 20 '14 at 11:13
  • You won't get actual iframe content from that iframe source url. I have already tried that :( –  Jun 20 '14 at 11:13
  • Thank you for your effort. I have already got iframe source URL. That's not the problem. The main issue is that I am not able to get correct iframe content with that source URL. –  Jun 20 '14 at 11:27
  • My code gives you the html source of the iframe no the url ! I don't understand why you say the content is incorrect. What do you do with that content ? – christof13 Jun 20 '14 at 11:34
  • If you check the url I have posted, you can see content inside iframe i.e. grid which has data in it. But when you download content from iframe source url in code, you won't get those data. I want to get those data as well –  Jun 20 '14 at 11:39
  • Thank you for your update. So, is it not possible to get iframe data? –  Jun 20 '14 at 15:38