0

Lets say I have a HtmlDocument variable:

HtmlDocument document = Client.Get(My_Webpage);

In which the inner HTML looks something like this:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"
  lang="en"
  xml:lang="en"
  dir="ltr">

...

<head>
<script>...</script>

<script type="text/javascript">...</script>

<script type="text/javascript">
    <!--//--><![CDATA[//><!--
        jQuery.extend(*JSON THAT I NEED*)
    //--><!]]>
</script>

</head>

...

Is there an easier way to extract that peice of JSON? Currently I am just manipulating the HTML as a string to retrieve the contents, then deserializing it into an object from there. This doesn't seem like the proper way to do it however.

  • What is the format of the JSON? The HtmlAgility pack is great for playing with the HTML, not so great at parsing JSON. Can you identify it with a regex to parse it out? – Dan Saltmer Sep 30 '14 at 09:04

2 Answers2

0

Typical way to build this would be:

Server side you have a service (using System.Web.Mvc;)

[WebMethod]
[ScriptMethod(UseHttpGet = true, ResponseFormat = ResponseFormat.Json)]
public JsonResult getData(int nr)
{
  return Json(Enumerable.Range(nr));
}

Client side you have

$.ajax({
  url: 'YourServiceURL',
  success: function(data) {
     alert('Web Service Called!');
  }
});

You might want to take a look at : http://www.asp.net/get-started

Margus
  • 19,694
  • 14
  • 55
  • 103
  • Maybe some further context was needed here - I'm not using ASP or MVC. It's just a console application that analyzes web pages to extract certain information from them. – Stewart Whitworth Sep 30 '14 at 09:00
  • Does not matter if it is mvc or not (just an example where it makes sense). http://stackoverflow.com/questions/2158106/web-reference-vs-service-reference – Margus Sep 30 '14 at 09:23
0

I would Regex this one:

string jsonYouNeed = Regex.Match(documentInnerHtml,@"jQuery\.extend\((.*?)\)").Groups[1].Value;
Tyress
  • 3,573
  • 2
  • 22
  • 45