0

My query is that I want to convert html to json with C#. Is there any way to do it. I searched a lot and found articles related to using Javascript Serializer and Newtonsoft to serialize the html string to json. But these serializers do nothing except adding a opening and closing curly braces around the html string. I don't want that. I want to convert whole html to json so that I can get relevant information from the html using C# objects instead of parsing html with regular exressions. Html can be any valid html from any website available on the internet. I am getting the html using http request & response objects using C#.

Please don't suggest using html agility pack because that will also do the same thing that Serialization does.

If anybody have any idea how to do this with C# then please share your ideas.

Puneet Pant
  • 918
  • 12
  • 37
  • It is not clear what you are expecting as an answer or expecting the answer to do. For any specific problem you are having please include a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve). Please also read [How do I ask a Good Question](http://stackoverflow.com/help/how-to-ask). Make sure that your questions are specific and not overly broad. Finally you have to make an attempt yourself, the forum members will not write your code for you. – Igor Mar 03 '16 at 13:22
  • Can you clarify on why querying HTML using a tool *specifically designed to query HTML* is a bad idea? And that you think the only other option is to use regex or convert it into JSON? Seems like an XY problem to me. – Rob Apr 23 '17 at 08:23

2 Answers2

0

I will tell why your question can cause confusion. Consider example of html:

<html>
 <body>
   <p> example of paragraph </p>
 </body>     
</html>

Example of json:

{"employees":[
    {"firstName":"John", "lastName":"Doe"},
    {"firstName":"Anna", "lastName":"Smith"},
    {"firstName":"Peter", "lastName":"Jones"}
]}

json is something, on which html is generated or initial fundament. So when you say you want to convert html to json it's really confusing, because it is impossible to figure out according to which rules you want to make this conversion. Or which tags from html should be ignored/added while creating json.

Yuriy Zaletskyy
  • 4,983
  • 5
  • 34
  • 54
  • 1
    1. In objective manner, when you treat every element as an object. So you generate something like: ``` { "elements":[ { "type":"html", "content":null, "children":[ { "type":"body", "content":null, "children":[ { "type":"p", "content":"example of paragraph type", "children":[ ... }``` 2. If it is not stated which should be omitted then ever should be included. – ignacy130 Apr 23 '17 at 08:19
0

There is a javascript example of the solution here: Map HTML to JSON DOM Parsers are pretty similar so you can try implementing it in C#. (I'd be interested in such implementation as well :D )

Community
  • 1
  • 1
ignacy130
  • 322
  • 1
  • 2
  • 17