0

I am automating a task using webbrowser control , the site display pages using frames. My issue is i get to a point , where i can see the webpage loaded properly on the webbrowser control ,but when it gets into the code and i see the html i see nothing.

I have seen other examples here too , but all of those do no return all the browser html.

What i get by using this:

                    HtmlWindow frame = webBrowser1.Document.Window.Frames[1];
                    string str = frame.Document.Body.OuterHtml;

Is just :

The main frame tag with attributes like SRC tag etc, is there any way how to handle this?Because as i can see the webpage completely loaded why do i not see the html?AS when i do that on the internet explorer i do see the pages source once loaded why not here?

ADDITIONAL INFO

There are two frames on the page :

i use this to as above:

HtmlWindow frame = webBrowser1.Document.Window.Frames[0];

        string str = frame.Document.Body.OuterHtml;

And i get the correct HTMl for the first frame but for the second one i only see:

<FRAMESET frameSpacing=1 border=1 borderColor=#ffffff frameBorder=0 rows=29,*><FRAME title="Edit Search" marginHeight=0 src="http://web2.westlaw.com/result/dctopnavigation.aspx?rs=WLW12.01&amp;ss=CXT&amp;cnt=DOC&amp;fcl=True&amp;cfid=1&amp;method=TNC&amp;service=Search&amp;fn=_top&amp;sskey=CLID_SSSA49266105122&amp;db=AK-CS&amp;fmqv=s&amp;srch=TRUE&amp;origin=Search&amp;vr=2.0&amp;cxt=RL&amp;rlt=CLID_QRYRLT803076105122&amp;query=%22LAND+USE%22&amp;mt=Westlaw&amp;rlti=1&amp;n=1&amp;rp=%2fsearch%2fdefault.wl&amp;rltdb=CLID_DB72585895122&amp;eq=search&amp;scxt=WL&amp;sv=Split" frameBorder=0 name=TopNav marginWidth=0 scrolling=no><FRAME title="Main Document" marginHeight=0 src="http://web2.westlaw.com/result/dccontent.aspx?rs=WLW12.01&amp;ss=CXT&amp;cnt=DOC&amp;fcl=True&amp;cfid=1&amp;method=TNC&amp;service=Search&amp;fn=_top&amp;sskey=CLID_SSSA49266105122&amp;db=AK-CS&amp;fmqv=s&amp;srch=TRUE&amp;origin=Search&amp;vr=2.0&amp;cxt=RL&amp;rlt=CLID_QRYRLT803076105122&amp;query=%22LAND+USE%22&amp;mt=Westlaw&amp;rlti=1&amp;n=1&amp;rp=%2fsearch%2fdefault.wl&amp;rltdb=CLID_DB72585895122&amp;eq=search&amp;scxt=WL&amp;sv=Split" frameBorder=0 borderColor=#ffffff name=content marginWidth=0><NOFRAMES></NOFRAMES></FRAMESET>

UPDATE

The two url of the frames are as follows :

Frame1 whose html i see

http://web2.westlaw.com/nav/NavBar.aspx?RS=WLW12.01&VR=2.0&SV=Split&FN=_top&MT=Westlaw&MST=

Frame2 whose html i do not see:

http://web2.westlaw.com/result/result.aspx?RP=/Search/default.wl&action=Search&CFID=1&DB=AK%2DCS&EQ=search&fmqv=s&Method=TNC&origin=Search&Query=%22LAND+USE%22&RLT=CLID%5FQRYRLT302424536122&RLTDB=CLID%5FDB6558157526122&Service=Search&SRCH=TRUE&SSKey=CLID%5FSSSA648523536122&RS=WLW12.01&VR=2.0&SV=Split&FN=_top&MT=Westlaw&MST=

And the properties of the second frame whose html i do not get are in the picture below:

enter image description here

Thank you

confusedMind
  • 2,573
  • 7
  • 33
  • 74

4 Answers4

1

I paid for the solution of the question above and it works 100 %.

What i did was use this function below and it returned me the count to the tag i was seeking which i could not find :S.. Use this to call the function listed below:

FillFrame(webBrowser1.Document.Window.Frames);



private void FillFrame(HtmlWindowCollection hwc)
        {


            if (hwc == null) return;
            foreach (HtmlWindow hw in hwc)
            {
                HtmlElement getSpanid = hw.Document.GetElementById("mDisplayCiteList_ctl00_mResultCountLabel");
                if (getSpanid != null)
                {

                    doccount = getSpanid.InnerText.Replace("Documents", "").Replace("Document", "").Trim();

                    break;
                }

                if (hw.Frames.Count > 0) FillFrame(hw.Frames);
            }


        }

Hope it helps people .

Thank you

confusedMind
  • 2,573
  • 7
  • 33
  • 74
0

For taking html you have to do it that way:

        WebClient client = new WebClient();
        string html = client.DownloadString(@"http://stackoverflow.com");

That's an example of course, you can change the address. By the way, you need using System.Net;

liran63
  • 1,300
  • 2
  • 15
  • 17
0

This works just fine...gets BODY element with all inner elements:

Somewhere in your Form code:

wb.Url = new Uri("http://stackoverflow.com");
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wbDocumentCompleted);

And here is wbDocumentCompleted:

void wb1DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    var yourBodyHtml = wb.Document.Body.OuterHtml;
}

wb is System.Windows.Forms.WebBrowser

UPDATE:

The same as for the document, I think that your second frame is not loaded at the time you check for it's content...You can try solutions from this link. You will have to wait for your frames to be loaded in order to see its content.

Community
  • 1
  • 1
Aleksandar Vucetic
  • 14,715
  • 9
  • 53
  • 56
  • Nope as i said it only takes the frame tags, and the innerhtml of the frame is , however it is showing in the webpage :S// – confusedMind Feb 12 '12 at 10:41
  • I missed the fact that you have problems with frames. Take a look at my updated answer :). If it still doesn't work, can you please post your exact code in your question, because it might happen that there is something else you are doing wrong. – Aleksandar Vucetic Feb 12 '12 at 15:34
0

The most likely reason is that frame index 0 has the same domain name as the main/parent page, while the frame index 1 has a different domain name. Am I correct?

This creates a cross-frame security issue, and the WB control just leaves you high and dry and doesn't tell you what on earth went wrong, and just leaves your objects, properties and data empty (will say "No Variables" in the watch window when you try to expand the object).

The only thing you can access in this situation is pretty much the URL and iFrame properties, but nothing inside the iFrame.

Of course, there are ways to overcome teh cross-frame security issues - but they are not built into the WebBrowser control, and they are external solutions, depending on which WB control you are using (as in, .NET version or pre .NET version).

Let me know if I have correctly identified your problem, and if so, if you would like me to tell you about the solution tailored to your setup & instance of the WB control.

UPDATE: I have noticed that you're doing a .getElementByTagName("HTML")(0).outerHTML to get the HTML, all you need to do is call this on the document object, or the .body object and that should do it. MyDoc.Body.innerHTML should get the the content you want. Also, notice that there are additional iFrames inside these documents, in case that is of relevance. Can you give us the main document URL that has these two URL's in it so we / I can replicate what you're doing here? Also, not sure why you are using DomElement but you should just cast it to the native object it wants to be cast to, either a IHTMLDocument2 or the object you see in the watch window, which I think is IHTMLFrameElement (if i recall correctly, but you will know what i mean once you see it). If you are trying to use an XML object, this could be the reason why you aren't able to get the HTML content, change the object declaration and casting if there is one, and give it a go & let us know :). Now I'm curious too :).

Erx_VB.NExT.Coder
  • 4,838
  • 10
  • 56
  • 92