2

I have html string/document (if you view source any web page in the IE browser, the resulting html code) stored in the database and I need to convert this html to an image using C#/.Net wcf service. The current code is using WebBrowser and it works fine except it is pretty slow and runs into issues once in a while.

I want to replace this capability (html conversion to .png image) without using WebBrowser. I tried to use TheArtofdev.HtmlRenderer which appears to solve most issues but it seems to run into problems when it encounters html (containing multiple "div>" may be ?) like the following html and it doesn't generate the image correctly.It generates only a partial image for this html.

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><style>p { padding: 0px; border: 0px ; margin: 0px;}ul { padding: 0px; border: 0px; margin: 0px;}</style><title>Item 18626</title></head><body><table border="0" width="700"><tr><td><strong><font face="verdana,geneva">Item 18626</font></strong><br></br></td></tr><tr><td><font face="verdana,geneva"><p style="font-weight:normal; font-weight:normal; ">Use the following passages to answer the question. </p><p style="font-weight:normal; ">&#xA0;</p><div align="center"><p><b>Mariana's Thesis Statement</b></p></div><div align="center"><table><colgroup style="width:46.6206em; " /><tbody><tr><td style="padding:0.2292em;  border:0.0834em solid #000000; "><p style="">The Cold War policy of containment was a success.</p></td></tr></tbody></table></div><p>&#xA0;</p><div align="center"><p><b>Information on the Vietnam War</b></p></div><div align="center"><table><colgroup style="width:46.6206em; " /><tbody><tr><td style="padding:0.2292em;  border:0.0834em solid #000000; "><p style="">The United States sent advisors to South Vietnam in 1954 to help the government fight communist guerrillas who wanted to reunite with communist North Vietnam. By 1964, the United States began sending combat troops. The U.S. military used extensive bombing and advanced technology; however, they were unable to turn the tide against communism. In 1973, the United States withdrew from South Vietnam, and in 1975, the government of South Vietnam fell to the communists and the country was reunited.</p></td></tr></tbody></table></div><p>&#xA0;</p><p>Mariana is writing a paper about the Cold War and discovers the information above on the Vietnam War. </p><p style="font-weight:normal; ">&#xA0;</p><p style="font-weight:normal; ">Describe one way the passage above refutes Mariana's thesis.</p><p style="font-weight:normal; ">&#xA0;</p><p style="font-weight:normal; ">Type your answer in the space provided.</p></font></td></tr><tr><td>&nbsp;</td></tr><tr><td><div id="001" style="border:solid black 1px;width:700px;overflow-x:hidden;overflow-y:visible;text-align:left;word-wrap:break-word"><p>it refutes because it dosent talk about the cold war it only talks about the vietnam war and it just says that the codntainment war was a success thats all it says about the containment war it &nbsp;really just talks about the vietnam war</p>
    </div></td></tr></table><br></br></body></html>

Can I tweak the above html to make the HtmlRender happy somehow ? What are the other options (.Net/C#) if HtmlRender doesn't work as expected ? Highly appreciate any help !!

SJamal
  • 55
  • 1
  • 7

2 Answers2

1

You could do this using a headless browser such as PhantomJS, which has a NuGet package you could install in your project. With PhantomJS you can create a screenshot of a rendered page.

The PhantomJS package paired together with the Selenium WebDriver and Selenium WebDriver Support Classes packages allows you to take a screenshot using a slight modification of the method given in an answer to another stackoverflow question, Getting screenshot using PhantomJS in C# (note that you'll need to add the OpenQA.Selenium.PhantomJS, OpenQA.Selenium.Support.Extensions, and System.Drawing.Imaging namespaces):

PhantomJSDriver driver = new PhantomJSDriver();
driver.Manage().Window.Maximize(); // optional
driver.Navigate().GoToUrl("file:///C://fullpath//file.html");

driver.TakeScreenshot().SaveAsFile("screenshot.png", ImageFormat.Png);

driver.Quit();
Community
  • 1
  • 1
Gabby Paolucci
  • 887
  • 8
  • 23
  • Thanks. I added the Nuget for NReco.PhantomJS package but I don't see PhantomJSDriver one of the object in it. Am I missing something here ?? – SJamal Dec 15 '15 at 21:48
  • I've updated my answer to include the additional packages and namespaces you'll need to use the example code. Note that the example is using the [PhantomJS](https://www.nuget.org/packages/PhantomJS/) package, not the [NReco.PhantomJS](https://www.nuget.org/packages/NReco.PhantomJS/) package. – Gabby Paolucci Dec 16 '15 at 14:57
  • I installed just the PhantomJS via NuGet and it added just a phantomjs.exe. I'd expect a .dll added to the project so that VS recognizes the new objects in the newly added namespace. I installed the Selenium WebDriver which added a few libraries to the project reference but I still don't see the "PhantomJSDriver" . any other .dlls which needs to be included explicitly for VS to recognize these namespaces ? I guess I am lost again as to how to use this phantomjs.exe programmatically ? Usually you'd have dll add to project and it allows to include any new namespace by "using" keyword, clue ? – SJamal Dec 16 '15 at 16:36
  • All you need to do is just install all three of those packages with NuGet to get the project setup correctly; NuGet will add all references and get the things setup without you having to do anything manually, besides adding `using OpenQA.Selenium.PhantomJS;`, `using OpenQA.Selenium.Support.Extensions;`, and `using System.Drawing.Imaging`; to the top of the C# class you're writing your code in. – Gabby Paolucci Dec 16 '15 at 16:46
  • Thanks for the detailed response ! I finally got this working however it seems to work only with a given "WEB" Url e.g. "http://stackoverflow.com" but I tried to pass it a physical path of a file instead then it didn't do anything with it. I need to work with html file and or html string values stored in the database as I mentioned in my original post. – SJamal Dec 16 '15 at 19:33
  • I updated the example again to show how to open a local file. In order to open a local file using PhantomJS, you need to use a [file URI](http://www.wikiwand.com/en/File_URI_scheme), as described in [PhantomJS fails to open local file](http://stackoverflow.com/questions/19939046/phantomjs-fails-to-open-local-file) – Gabby Paolucci Dec 16 '15 at 20:55
  • any example of passing an html string instead ? I thank you much and truly appreciate of all the help you've provided so far !!!! – SJamal Dec 16 '15 at 21:34
0

You might try Awesomium.

It's like an open source webbrowser control based on the chromium browser.

Underground
  • 127
  • 5