1

I’m trying to retrieve the content of a webpage in c#. The problem is that the webpage uses Ajax and JavaScript to dynamically create and populate the HTML elements.

The webpage I’m talking about is: http://diseases.jensenlab.org/Entity?order=textmining,knowledge,experiments&textmining=10&knowledge=10&experiments=10&type1=9606&type2=-26&id1=ENSP00000317985

If you use httpWebRequest to get the HTML code of the website, only the JavaScript calls are visible and not the content. So how can you get the return results of the JavaScript that is being displayed on the webpage in a console c# program? I have tried using the web browser class but can’t get it to work.

How do you use the web browser class in a new thread to display the dynamically created table’s results in an Array List? Further how do you access the relevant HTML tag if you do not know the name? Can you use the ID tag? This is assuming that the web browser class is the best way to go about doing this. Or is there a better way?

The relevant HTML code part is:

<div class="ajax_table" id="53c2583b1f204464d7fa9387e2ac1868"><script>blackmamba_pager('Textmining', 'type1=9606id1=ENSP00000317985type2=-26title=Text+mining',
10, 1, '53c2583b1f204464d7fa9387e2ac1868');</script></div>

Please provide me with an example of how this is done?

shA.t
  • 16,580
  • 5
  • 54
  • 111
Hendri
  • 21
  • 5
  • possible duplicate of [Scraping Dynamic content](http://stackoverflow.com/questions/6245294/scraping-dynamic-content) – Robert Moskal May 02 '15 at 21:51
  • I have looked at tI have looked at the other question’s answer but don’t understand how to use that example with my scenario. I have added more information to clarify things. – Hendri May 03 '15 at 07:41

1 Answers1

0

Here. then, also taken from stack overflow :):

WebBrowser mywebBrowser;
private void Form1_Load(object sender, EventArgs e)
{
 mywebBrowser = new WebBrowser();
 mywebBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(mywebBrowser_DocumentCompleted);

 Uri address = new Uri("http://www.cnn.com/");
 mywebBrowser.Navigate(address);
}

 private void mywebBrowser_DocumentCompleted(Object sender,WebBrowserDocumentCompletedEventArgs e)
 {
  //Until this moment the page is not completely loaded
  HtmlDocument doc = mywebBrowser.Document;
  HtmlElementCollection tagCollection;
  tagCollection = doc.GetElement("53c2583b1f204464d7fa9387e2ac1868");
 }

There's no direct way to get elements by class name like with jQuery. If id of your table div isn't stable, you might use GetElementsByTagName, iterate through the results. You can then use GetAttribute("classname") to match your "ajax_table" class.

Robert Moskal
  • 21,737
  • 8
  • 62
  • 86
  • I tryed to change it to work on a console program. However I get a null pointer exception was unhandeled by user code at the following line. – Hendri May 03 '15 at 19:52
  • Here is my code: private void runBrowserThread(Uri url) { var th = new Thread(() => { var mywebBrowser = new WebBrowser(); mywebBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(mywebBrowser_DocumentCompleted); mywebBrowser.Navigate(url); Application.Run(); }); th.SetApartmentState(ApartmentState.STA); th.Start(); } – Hendri May 03 '15 at 20:06
  • private void mywebBrowser_DocumentCompleted(Object sender, WebBrowserDocumentCompletedEventArgs e) { //Until this moment the page is not completely loaded HtmlDocument doc = mywebBrowser.Document; //error here HtmlElement tagCollection; tagCollection = doc.GetElementById("53c2583b1f204464d7fa9387e2ac1868"); } Why do I get that exception? Further will you get the content of the table with tagCollection.InnerTeks? – Hendri May 03 '15 at 20:10
  • Looks to me like when you create your webbrowser control in the thread it's not visible to the document complete callback. It expects the control be available to it. – Robert Moskal May 03 '15 at 21:49
  • ok, but why is the HtmlDocument of myWebBrowser null? Does that mean that the Id i'm using was not found or that the page did not load. Is there anything I must change in my code to fix this problem? – Hendri May 04 '15 at 06:01
  • Now i'm getting the same exception but at this line: tagCollection.InnerText. If i understand it correctly it is because the value of the htmlElement tagCollection is null. – Hendri May 04 '15 at 06:11
  • It's hard to read the code in the comments, post another question or debug your problem yourself. Does the document load callback return an error? Or maybe there's another callback for errors. Check the contents of the returned document to see if they are what you expect. Finally, I don't see how you are iterating over the tag collection. Good luck! – Robert Moskal May 04 '15 at 12:31